Rna targeting methods and compositions

ABSTRACT

Provided herein are CRISPR/Cas methods and compositions for targeting RNA molecules, which can be used to detect, edit, or modify a target RNA.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.17/711,924, filed Apr. 1, 2022, which is a continuation of U.S.application Ser. No. 17/352,551, filed Jun. 21, 2021, now U.S. Pat. No.11,310,179, which is a continuation of U.S. application Ser. No.16/257,493, filed Jan. 25, 2019, now U.S. Pat. No. 11,228,547, which isa divisional of U.S. application Ser. No. 15/937,699, filed Mar. 27,2018, now U.S. Pat. No. 10,476,825, which claims priority to U.S.Provisional Application No. 62/548,846 filed Aug. 22, 2017, USProvisional Application No. 62/572,963 filed Oct. 16, 2017, and U.S.Provisional Application No. 62/639,178, filed Mar. 6, 2018, all of whichare herein incorporated by reference in their entireties.

ACKNOWLEDGMENT OF GOVERNMENT SUPPORT

This invention was made with government support under 5 DP5 OD021369-02and 5 R21 AG056811-02 awarded by The National Institutes of Health. Thegovernment has certain rights in the invention.

FIELD

This disclosure relates to a CRISPR/Cas system for modifying (includingdetecting) RNA, which utilizes novel Cas13d proteins (also referred to aCasR and nCas1) and guide RNAs.

INCORPORATION OF ELECTRONIC SEQUENCE LISTING

The electronic sequence listing, submitted herewith as an XML file named7158-99284-28.xml, created on May 26, 2023, ˜650,581 bytes, is hereinincorporated by reference in its entirety.

BACKGROUND

Mapping of transcriptome changes in cellular function and disease hasbeen transformed by technological advances over the last two decades,from microarrays (Schena et al., 1995) to next-generation sequencing andsingle cell studies (Shendure et al., 2017). However, interrogating thefunction of individual transcript dynamics and establishing causallinkages between observed transcriptional changes and cellular phenotyperequires the ability to actively control or modulate desiredtranscripts.

DNA engineering technologies such as CRISPR-Cas9 (Doudna andCharpentier, 2014; Hsu et al., 2014) enable researchers to dissect thefunction of specific genetic elements or correct disease-causingmutations. However, simple and scalable tools to study and manipulateRNA lag significantly behind their DNA counterparts. Existing RNAinterference technologies, which enable cleavage or inhibition ofdesired transcripts, have significant off-target effects and remainchallenging engineering targets due to their key role in endogenousprocesses (Birmingham et al., 2006; Jackson et al., 2003). As a result,methods for studying the functional role of RNAs directly have remainedlimited.

One of the key restrictions in RNA engineering has been the lack ofRNA-binding domains that can be easily retargeted and introduced intotarget cells. The MS2 RNA-binding domain, for example, recognizes aninvariant 21-nucleotide (nt) RNA sequence (Peabody, 1993), thereforerequiring genomic modification to tag a desired transcript Pumiliohomology domains possess modular repeats with each protein modulerecognizing a separate RNA base, but they can only be targeted to short8 nt RNA sequences (Cheong and Hall, 2006). While previouslycharacterized type II (Batra et al., 2017; O'Connell et al., 2014) andVI (Abudayyeh et al., 2016; East-Seletsky et al., 2016) CRISPR-Cassystems can be reprogrammed to recognize 20-30 nt RNAs, their large size(˜1200 amino acids, aa) makes it difficult to package into AAV forprimary cell and in vivo delivery.

SUMMARY

This application provides bioinformatic analysis of prokaryotic genomesto identify sequence signatures of CRISPR-Cas repeat arrays and minepreviously uncharacterized, compact Cas ribonucleases that can be usedfor RNA targeting tools. Engineered Type VI-D CRISPR effectors can beused to efficiently knockdown endogenous RNAs in human cells andmanipulate alternative splicing, paving the way for RNA targetingapplications and further effector domain fusions as part of atranscriptome engineering toolbox.

Provided herein are methods of modifying one or more target RNAmolecules, such as a clustered regularly interspaced short palindromicrepeats (CRISPR)-CRISPR associated (Cas) system-mediated RNA editingmethod. Such methods can include contacting one or more target RNAmolecules with a non-naturally occurring (e.g., does not naturally occurin the cell or system into which it is introduced) or engineeredCRISPR-Cas system. Such a CRISPR-Cas system can include (1) at least oneCas13d protein or at least one Cas13d nucleic acid coding sequence (suchas a mRNA or a vector encoding the at least one Cas13d protein) and (2)at least one CRISPR-Cas system guide nucleic acid molecule (such as aguide RNA, gRNA) that hybridizes with the one or more target RNAmolecules, or at least one nucleic acid molecule encoding the gRNA. TheCas13d protein forms a complex with the gRNA, and the gRNA directs thecomplex to the one or more target RNA molecules and modifies (e.g.,cuts, detects) the one or more target RNA molecules. In some examples,the one or more target RNA molecules (or a cell containing the one ormore target RNA molecules) are contacted with a complex including the atleast one Cas13d protein and the at least one gRNA. In some examples,the system includes Mg²⁺. However, in some example, the system does notinclude Mg²⁺, such as if cleavage of the target RNA is not desired.

In some examples, contacting the one or more target RNA molecules withthe non-naturally occurring or engineered CRISPR-Cas system includesintroducing into a cell (such as a eukaryotic or prokaryotic cell)containing the one or more target RNA molecules the non-naturallyoccurring or engineered CRISPR-Cas system, for example usingendocytosis, a liposome, a particle, an exosome, a microvesicle, a genegun, electroporation, a virus, or combinations thereof. In someexamples, contacting the one or more target RNA molecules with thenon-naturally occurring or engineered CRISPR-Cas system includescontacting a cell-free system (such as a biological or environmentalsample, or a cell lysate) containing the one or more target RNAmolecules with the non-naturally occurring or engineered CRISPR-Cassystem (for example in a diagnostic method to detect a target RNA).

In some examples, the least one Cas13d protein includes one or more HEPNdomains, is no more than 150 kD, no more than 140 kD, no more than 130kD, no more than 120 kD, such as about 90 to 120 kD, about 100 to 120 kDor about 110 kD; includes one or more mutated HEPN domains, and canprocess the guide RNA, but cannot cleave or cut the one or more targetRNA molecules, includes an Cas13d ortholog from a prokaryotic genome ormetagenome, gut metagenome, an activated sludge metagenome, an anaerobicdigester metagenome, a chicken gut metagenome, a human gut metagenome, apig gut metagenome, a bovine gut metagenome, a sheep gut metagenome, agoat gut metagenome, a capybara gut metagenome, a primate gutmetagenome, a termite gut metagenome, a fecal metagenome, a genome fromthe Order Clostridiales, or the Family Ruminococcaceae; includes anCas13d ortholog from Ruminococcus albus, Eubacterium siraeum, aflavefaciens strain XPD3002, Ruminococcus flavefaciens FD-1, unculturedEubacterium sp TS28-c4095, uncultured Ruminococcus sp., Ruminococcusbicirculans, or Ruminococcus sp CAG57; includes at least 80%, at least85%, at least 90%, at least 92%, at least 95%, at least 96%, at least97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164,166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200,202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253, orcombinations thereof. In some examples, the least one Cas13d protein hasat least 80%, at least 85%, at least 90%, at least 92%, at least 95%, atleast 96%, at least 97%, at least 98%, at least 99%, or 100% sequenceidentity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138, 147, 149, 153,155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181, 183, 185,187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218,220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247,249, 251, or 253, and includes at least one motif shown in SEQ ID NO:195, Motif 2, or Motif 3. In some examples, the least one Cas13d proteinfurther includes one or more other agents (e.g., is a fusion protein),such as one or more subcellular localization signals, one or moreeffector domains, or combinations thereof. In some examples, the leastone Cas13d protein that includes one or more HEPN domains, is no morethan 1500 aa, no more than 1200 aa, no more than 1100 aa, no more than1000 aa, such as about 800 to 1500 aa, about 800 to 1250 aa or about 850to 950 aa.

Also provided are isolated nucleic acid molecules encoding such Cas13dproteins, such as a cDNA, genomic DNA, RNA, or mRNA. Such isolatednucleic acid molecules can be part of a vector (such as a plasmid orviral vector), and can be operably linked to a promoter. In someexamples, an isolated nucleic acid molecule encoding a Cas13d proteinhas at least 80%, at least 85%, at least 90%, at least 92%, at least95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%sequence identity to SEQ ID NO: 124, 125, 126, 127, 128, 139, 140 or141. In some examples, an isolated nucleic acid molecule encoding atleast one Cas13d protein (which can be part of a vector) includes atleast one Cas13d protein coding sequence codon optimized for expressionin a eukaryotic cell, such as human cell, for example a Cas13d codingsequence having at least 80%, at least 85%, at least 90%, at least 92%,at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 114, 115, 116, 117, 118, 119, 120,121, 122, 123, 142, 143, 144, or 145.

In some examples, the gRNA that hybridizes with the one or more targetRNA molecules in an Cas13d-mediated manner includes one or more directrepeat (DR) sequences, one or more spacer sequences, such as one or moresequences comprising an array of DR-spacer-DR-spacer. In some examples,the one or more DR sequences have at least 80%, at least 85%, at least90%, at least 92%, at least 95%, at least 96%, at least 97%, at least98%, at least 99%, or 100% sequence identity to SEQ ID NO: 129, 130,131, 132, 133, 134, 135, 136, 137, 148, 150, 151, 152, 154, 156, 157,159, 161, 163, 165, 167, 169, 176, 178, 180, 182, 184, 186, 188, 190,191, 192, 193, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219,221, 223, 225, 227, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246,248, 250, 252, or 254. In one example, the gRNA includes additionalsequences, such as an aptamer sequence.

In some examples, a plurality of gRNAs are generated from a singlearray, wherein each gRNA can be different, for example target differentRNAs or target multiple regions of a single RNA, or combinationsthereof.

Methods of targeting one or more target RNA molecules are provided. Insome examples, an entire RNA is targeted. In some examples, a portion ofan RNA is targeted. Targeting an RNA molecule can include one or more ofcutting or nicking one or more target RNA molecules, activating one ormore target RNA molecules, deactivating the one or more target RNAmolecules, visualizing or detecting the one or more target RNAmolecules, labeling the one or more target RNA molecules, binding theone or more target RNA molecules, editing the one or more target RNAmolecules, trafficking the one or more target RNA molecules, and maskingthe one or more target RNA molecules. In some examples, modifying one ormore target RNA molecules includes one or more of an RNA basesubstitution, an RNA base deletion, an RNA base insertion, a break inthe target RNA, methylating RNA, and demethylating RNA.

In some examples, such methods are used to treat a disease, such as adisease in a human. In such examples, the one or more target RNAmolecules is associated with the disease

Also provided are isolated proteins, including non-naturally occurringproteins. In some examples, a protein has at least 80%, at least 85%, atleast 90%, at least 92%, at least 95%, at least 96%, at least 97%, atleast 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,110, 111, 112, or 113. In some examples, an isolated protein is anCas13d ortholog from a prokaryotic genome or metagenome, gut metagenome,an activated sludge metagenome, an anaerobic digester metagenome, achicken gut metagenome, a human gut metagenome, a pig gut metagenome, abovine gut metagenome, a sheep gut metagenome, a goat gut metagenome, acapybara gut metagenome, a primate gut metagenome, a termite gutmetagenome, a fecal metagenome, a genome from the Order Clostridiales,or the Family Ruminococcaceae. In some examples, an Cas13d orthologincludes an Cas13d ortholog from Ruminococcus albus, Eubacteriumsiraeum, a Ruminococcus flavefaciens strain XPD3002, Ruminococcusflavefaciens FD-1, uncultured Eubacterium sp TS28-c4095, unculturedRuminococcus sp., Ruminococcus bicirculans, or Ruminococcus sp CAG57.The protein is an Cas13d protein that further includes one or more otheragents or domains (e.g., is a fusion protein), such as one or moresubcellular localization signals, one or more effector domains, orcombinations thereof.

Also provided are isolated guide RNA (gRNA) molecules. In some examples,an isolated gRNA includes one or more direct repeat (DR) sequences, suchas an unprocessed (e.g., about 36 nt) or processed DR (e.g., about 30nt). In some examples a DR has at least 80%, at least 85%, at least 90%,at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, atleast 99%, or 100% sequence identity to SEQ ID NO: 129, 130, 131, 132,133, 134, 135, 136, 137, 148, 150, 151, 152, 154, 156, 157, 159, 161,163, 165, 167, 169, 176, 178, 180, 182, 184, 186, 188, 190, 191, 192,193, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223,225, 227, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250,252, or 254. Such a gRNA can further include one or more spacersequences specific for (e.g., is complementary to) the target RNA.

Also provided are ribonucleoprotein (RNP) complexes, which include anCas13d protein provided herein and a gRNA provided herein.

Also provided are recombinant cells that include any Cas13d protein (ornucleic acid molecule encoding Cas13d), any gRNA, any RNP complex, orany vector, provided herein. In one example, the cell is not a bacterialcell. In one example, the cell is a bacterial cell.

Also provided are compositions that include one or more of any Cas13dprotein (or nucleic acid molecule encoding Cas13d), any gRNA, any RNPcomplex, any isolated nucleic acid molecule, any vector, or any cell,provided herein. Such compositions can include a pharmaceuticallyacceptable carrier.

Also provided are kits. Such kits can include one or more of any Cas13dprotein (or nucleic acid molecule encoding Cas13d), any gRNA, any RNPcomplex, any isolated nucleic acid molecule, any vector, any cell, orany composition provided herein. Such reagents can be combined or inseparate containers.

The foregoing and other objects and features of the disclosure willbecome more apparent from the following detailed description, whichproceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B: Bioinformatic pipeline for the identification of theRNA-targeting class 2 CRISPR system Cas13d. (A) Schematic describing acomputational pipeline for CRISPR system identification. A minimaldefinition for a putative class 2 CRISPR locus was used, requiring onlya CRISPR repeat array and a nearby protein >750 aa in length. Theinitial search was performed on prokaryotic genome assemblies derivedfrom NCBI Genome, and later expanded via TBLASTN of predicted Cas13dproteins against public metagenome sequences without predicted openreading frames. DR, direct repeat. (B) Phylogenetic classification andalignment of full-length Cas13d effectors and metagenomic fragments.Cas13d effectors and metagenomic Cas13d protein fragments cluster intoseveral distinct branches, which are colored for ease of interpretation.Shading indicates residue conservation using the Blosum62 matrix.Full-length Cas13d effectors used in this study were sampled fromdistinct branches of the Cas13d family. Alignment of Cas13d proteins andprotein fragments was performed using ClustalOmega 1.2.4 andmaximum-likelihood tree building was performed with PhyML 3.2.

FIGS. 2A-2C: Type VI CRISPR-Cas13d is a family of single effector CRISPRribonucleases. (A) Maximum-likelihood phylogenetic tree of Cas13deffectors used herein, with the full Cas13d CRISPR locus depicted alongwith conserved HEPN RNase domains. Grey rectangles denote CRISPR directrepeats (DRs) and blue diamonds indicate spacer sequences. (B) RNAsequencing of a heterologously expressed Cas13d locus from an unculturedRuminococcus sp. sample. Mature gRNAs mapping to the CRISPR arrayindicate a processed nt DR and a variable spacer length from 14-26 nt(SEQ ID NO: 278). Co-fold analysis of direct repeat truncation indicatesa strong hairpin structure. (C) Purified E. siraeum Cas13d andcatalytically dead Cas13d (dCas13d) protein are each sufficient toprocess a guide array into its two component gRNAs. Addition of EDTAdoes not impair gRNA processing. ‘d’, dCas13d(R295A, H300A, R849A,H854A).

FIGS. 3A-3D. Phylogenetic classification of RNA-targeting class 2 CRISPReffectors and sequence conservation within the Cas13d family. (A) HEPNmotif conservation in Cas13d effectors used herein with conservedresidues shaded according to Blosum62. Consensus sequence of HEPN1domain region (SEQ ID NO: 279) and alignment of HEPN1 domain region ofseven Cas13d proteins (SEQ ID NOS: 280 to 286). Consensus sequence ofHEPN2 domain region (SEQ ID NO: 287) and alignment of HEPN2 domainregion of seven Cas13d proteins (SEQ ID NOS: 288 to 294). The RxxxxHHEPN motif is highlighted. (B) Maximum-likelihood tree of type VICRISPR-Cas families Average amino acid lengths of Type VI Cas13superfamily effectors are indicated in red. Alignment of previouslydescribed class 2 CRISPR RNA-targeting proteins (Abudayyeh et al., 2017;Cox et al., 2017; East-Seletsky et al., 2017; East-Seletsky et al.,2016; Smargon et al., 2017) and Cas13d effectors was performed usingMAFFT 7.38 and maximum-likelihood tree building was performed with PhyML3.2. Branch labels and scale bar indicate substitutions per site. (C)Predicted Cas13d direct repeat RNA secondary structure. (D) Sequencelogo of full length 36 nt Cas13d direct repeats.

FIGS. 4A-4C: Purification of recombinant Cas13d protein. EsCas13d wasexpressed as N-terminal His-MBP fusion and purified by successiveaffinity, cation exchange, and size exclusion chromatography. TheHis-tag was removed by TEV protease cleavage. (A) Chromatogram fromSuperdex 200 column for EsCas13d. (B) SDS-PAGE gel of size exclusionchromatography fractions for E. siraeum Cas13d. (C) SDS-PAGE gel ofpurified E. siraeum Cas13d and dCas13d (R295A, H300A, R849A, H854Amutations of predicted catalytic residues in both HEPN motifs).

FIGS. 5A-5D: Programmable RNA targeting by Cas13d in vitro. (A) E.siraeum Cas13d requires a matching guide array or mature gRNA toefficiently cleave complementary ssRNA targets. Denaturing gel depictscleavage reactions incubated at 37° C. for 1 hour. NT, non-targeting.(B) Substitution with dCas13d or addition of EDTA abrogateCas13d-mediated RNA targeting with both the guide and array. ‘d’, deadCas13d. (C) Denaturing gel depicting guide-target match dependentactivation of Cas13d cleavage activity. Scrambled target RNA (‘A’) isfluorescently labeled, while guide-complementary activator target RNA(‘B’) is unlabeled. RNA cleavage activity is abolished by the individualremoval of guide RNA or complementary target RNA, as well as theaddition of EDTA or the catalytic inactivation of Cas13d (indicated as‘d’). (D) A model for guide and target-dependent activation of Cas13dRNase activity. The ternary Cas13d:gRNA:target RNA complex is capable ofcleaving the complementary target RNA or bystander RNAs.

FIGS. 6A-6H. In vitro characterization of Cas13d properties. (A)Schematic showing the length and sequence of gRNA spacer truncations andspacer position relative to the complementary ssRNA target (from top tobottom, SEQ ID NOS: 295 to 309). (B) Denaturing gel depicting EsCas13dcleavage activity of target RNA with different spacer lengths. (C)Denaturing gel depicting EsCas13d cleavage reactions paired with 12guides from FIG. 3A tiling a complementary ssDNA version of the ssRNAtarget. (D) Denaturing gel depicting cleavage reactions using EsCas13dpaired with the same 12 guides tiling a dsDNA version of thecomplementary target. (E) Quantification of cleavage efficiency fromFIG. 3A. Each PFS base is the average of 3 different spacer sequencestiling a complementary target RNA. Cleavage percentage is determined bythe ratio of cleaved band intensity divided by total lane intensity.Mean is depicted ±SD with each data point representing an independentreplicate. (F) Cas13d-mediated cleavage of target RNA carrying differentPFS bases given an invariant spacer sequence. Quantification of Cas13dcleavage efficiency and a representative denaturing gel depictingEsCas13d cleavage activity are shown. Differences are not significant(one-way ANOVA, P=0.768). Cleavage percentage is determined as above,and mean is depicted ±SD with n=3. (G) and (H) Optimal temperature rangefor Cas13d activity. Denaturing gels depicting EsCas13d cleavageactivity at temperatures ranging from 16-62° C. for two different targetRNAs.

FIGS. 7A-7B: Characterization of Cas13d target substrate preference. (A)Cas13d can be generalizably reprogrammed with multiple guides and doesnot exhibit a protospacer flanking sequence (PFS) requirement. RNAcleavage by EsCas13d and 12 guides tiling the target RNA is shown.Control lanes are from a separate gel run in parallel. (B) Cas13dpreferentially cleaves uracil bases in the loop of a hairpin or a linearhomopolymer repeat, which is interrupted every 5 nt by a transitionmutation (X) to enable synthesis.

FIGS. 8A-8D: RNA knockdown activity screen of engineered Cas13dorthologs in human cells. (A) Schematic for mammalian expressionconstructs encoding for engineered Cas13d effectors and guides. NLS,nuclear localization signal. pre-gRNA, artificial unprocessed guide RNAcontaining a single 30 nt spacer sequence flanked by 2 full length 36 ntDRs. gRNA, predicted mature guide RNA with a single 30 nt processed DRand 22 nt spacer sequence (B) Heatmap of mCherry protein knockdown in aCas13d ortholog activity screen in human HEK 293FT cells using pools of4 pre-gRNAs or gRNAs. Normalized MFI, median fluorescent intensityrelative to non-targeting condition. Positions in gray were not tested,with n=3. (C) Immunocytochemistry of Cas13d showing localization andexpression of engineered constructs. Scale bar, 10 μm. Blue pseudocolor,DAPI staining of nuclei. (D) Comparison of Adm and Rfx Cas13d orthologconstructs for knockdown of endogenous B4GALNT1 mRNA revealsRfxCas13d-NLS (CasRx) to be most effective for both guide RNAarchitectures. Pools of 4 guides were used for targeting. NT,non-targeting. Values are mean±SEM with n=3.

FIGS. 9A-9H: CasRx mediates efficient and specific knockdown of diversehuman coding and noncoding transcripts. (A) Multiple guide RNAs tiling atarget transcript can be expressed as a single array and processed byRfxCas13d-NLS (CasRx) into individual gRNAs within the same cell. (B)Arrays of 4 guides each mediate target knockdown by CasRx in 293FT cellsvia transient transfection. Knockdown relative to GFP vehicle controlwas determined by qPCR, with n=3. (C) Schematic of CasRx targetsequences and spacer position-matched shRNAs. (D) Relative target RNAknockdown by individual position-matched shRNAs and CasRx gRNAs. NT,non-targeting. CRISPRi, dCas9-mediated CRISPR interference fortranscriptional repression (n=3) (E) Volcano plot of differentialtranscript levels between B4GALNT1 targeting and non-targeting (NT)shRNAs as determined by RNA sequencing (n=3). 542 non-specifictranscript changes were identified. (F) Volcano plot of differentialtranscript levels between B4GALNT1-targeting CasRx and non-targeting(NT) guide. Targeting guide position is matched to the shRNA shown in(E). B4GALNT1 was the only transcript exhibiting a significant change,with n=3. (G) Summary of significant off-target transcript perturbationsby matched shRNAs and CasRx guides. (H) CasRx targeting of 11 endogenoustranscripts, each with 3 guides and a non-targeting (NT) guide in 293FTcells. Transcript levels are relative to GFP vehicle control, mean±SEMwith n=3.

FIGS. 10A-10D. Comparison of engineered Cas13 superfamily effectors fortargeted knockdown and splicing. (A) Relative target RNA knockdown byindividual position-matched gRNAs for CasRx, NLS-LwaCas13a-msfGFP(Abudayyeh et al., 2017) and PspCas13b-NES (Cox et al., 2017) in HEK293FT cells. NT, non-targeting. Values are mean±SEM with n=3. (B)Comparison of Cas13 median knockdown efficiencies. n=3 per guide RNA.**** indicates P<0.0001 according to Friedman's test. (C) Exon exclusionby catalytically inactive NLS-dCas13a-msfGFP on the bichromatic splicingreporter. Guides are position-matched to those reported in FIG. 6B forCasRx. Values are mean±SEM with n=3. (D) Comparison of splicingmodulation by NLS-dCas13a-msfGFP and CasRx. Fold change in targeted exonexclusion relative to non-targeting guide is shown. **** indicatesP<0.0001 according to two-way ANOVA.

FIGS. 11A-11B. RNA sequencing from CasRx and shRNA targeting of ANXA4 inhuman cells. (A) Volcano plots of differential transcript levels betweenANXA4 targeting and non-targeting (NT) shRNAs as determined by RNAsequencing (n=3). 915 non-specific transcript changes were identified.(B) Volcano plot of differential transcript levels for an ANXA4targeting CasRx array used in FIG. 9B containing a guide positionmatched to the shRNA shown in (A) and a non-targeting (NT) array. ANXA4was the only transcript exhibiting significant downregulation with n=3.HIST2HBE was the only transcript identified to exhibit significantupregulation. H2B is a dimer partner of H2AX (Du et al., 2006) which hasbeen shown to interact with ANXA4 (Yang et al., 2010).

FIGS. 12A-12F: AAV delivery of catalytically inactive dCasRx spliceeffectors to manipulate alternative splicing. (A) Schematic ofbichromatic exon skipping reporter. +1 and +3, reading frame. BP,intronic branch point-targeting guide. SA, splice acceptorsite-overlapping guide. EX, exonic guide. SD, splice donorsite-overlapping guide. AUG, start codon. UGA, stop codon. Inclusion ofthe second exon leads to an out-of-frame (+3), non-fluorescenttranslation of dsRed followed by in-frame mTagBFP2. Exclusion of thetargeted exon leads to an in frame translation of dsRed (+1) followed bya stop codon. (B) Induced exon exclusion by dCasRx and an N-terminalhnRNPa1-dCasRx fusion protein targeted to pre-mRNA. The Gly-richC-terminal domain of hnRNPa1 is used as the effector domain Exonskipping efficiency is depicted as a relative percentage of cellscarrying primarily the dsRed or BFP isoform, determined through flowcytometry. NLS, nuclear localization signal. ‘A’, CRISPR array carryingall 4 guides. Values are mean±SEM with n=3. (C) AAV design carryingdCasRx and a three-guide array with total transgene size <4.3 kb,including AAV inverted terminal repeats (ITRs). (D) Schematic offrontotemporal dementia (FTD) disease modeling. Neurons are generatedvia Neurogenin-2 (Ngn2) directed differentiation of patient-derived andcontrol iPSCs followed by transduction with dCasRx or vehicle controlAAV (EFS-mTagBFP2). (E) FTD is associated with SNPs in a putativeintronic splice enhancer following exon 10 of the MAPT transcriptencoding for tau. Alternative splicing of MAPT exon 10 results in 4R tau(by inclusion) and 3R tau (by exclusion). SNPs in the intronic spliceenhancer including the indicated IVS 10+16 mutation result in increasedexon inclusion and higher levels of 4R tau. To facilitate reduction of4R tau levels, gRNAs contained in a dCasRx array were targeted to theexon 10 splice acceptor (g1) as well as two putative exonic spliceenhancers indicated in purple (g2, g3). (F) Relative 4R/3R tautranscript ratios in differentiated neurons were assayed via qPCR at 14days following transduction with AAV. FTD, frontotemporal dementia cellscarrying IVS 10+16. Values are mean±S.D. with n=3. **** indicatesP<0.0001.

FIG. 13 is a bar graph showing RNA targeting in human cells using thedisclosed methods.

FIG. 14 is a schematic drawing showing how the disclosed Cas13d and DRscan be used to achieve alternative splicing.

FIGS. 15A-15D are a series of panels showing: (A) Cas13d is convertedinto an active RNase complex upon binding a target matching the spacersequence of the guide RNA. It is capable of cleaving gRNA-complementarytarget RNA or non-complementary bystander RNAs. (B) Cas13dtarget-dependent RNase activity can be converted into a detectablesignal, for example through cleavage of a labeled detector RNA that iscleaved only in the presence of a target matching the spacer of theCas13d guide RNA. In this example, the detector RNA contains afluorophore, ‘F’, and a quencher ‘Q’, that abolishes fluorescence. Onlyupon bystander RNA cleavage is the fluorophore liberated from thequencher and fluorescence is generated. (C) Cas13d from E. siraeumproduces a visible signal only in the presence of a perfectly matchedtarget and not in the presence of different mismatched targets. (D)Cas13d from R. flavefaciens strain XPD3002 produces a visible signalonly in the presence of a perfectly matched target and not in thepresence of different mismatched targets.

FIGS. 16A-16B: (A) Alignment of 7 orthologs shows regions with high(green bars) and low (red bars) conservation. Regions selected fordeletion are marked 1-10. (B) knock-down (top) and splicing (bottom)activity of full-length CasRx and CasRx deletion variants. Deletionvariant 5 is shown to retain full activity, demonstrating thefeasibility of deleting areas of low conservation while retaining fullactivity.

FIGS. 17A and 17B show targeting of ccdB in bacterial cells. (A)construct introduced into bacterial cells expressing ccdB, and (B)relative expression of ccdB under various conditions.

FIGS. 18A-18MMM show an alignment of 53 different Cas13d proteins asfollows: SEQ ID NOs: 310 (037_-_emb|OIZA01000315.1), 183(emb|OCTW011587266.1), 189 (k87_11092736), 220 (BMZ-11B_GL0037915), 218(BMZ-11B_GL0037771), 222 (BMZ-11B_GL0069617), 229 (Ga0099364_10024192),216 (530373_GL0023589), 177 (emb|ODAI011611274.1), 200 (EMG_10003641),139 (CasR_P1E0_metageno), 179 (emb|OIZX01000427.1), 208(160582958_gene49834), 166 (gi|1198542314|gb|NFIR01000008.1), 185(emb|OGNF01009141.1), 202 (Ga0129306_1000735), 239 (MH0288_GL0082219),311 (PIG-022_GL0026351), 249 (PIG-028_GL0185479), 210(250twins_35838_GL0110300), 243 (PIG-014_GL0226364), 212(250twins_36050_GL0158985), 175 (emb|OJMM01002900.1), 164(emb|OGPN01002610.1), 160 (emb|OHCP01000044.1), 312 (PIG-046_GL0077813),313 (pig_chimera), 187 (emb|OIEN01002196.1), 241 (O2.UC29-0_GL0096317),140 (CasR_Anaerobic_dig), 162 (emb|OGDF01008514.1), 155(emb|OGZC01000639.1), 206 (Ga0224415_10048792_chimera), 181(emb|OCVV012889144.1), 231 (Ga0187910_10006931), 92 (R._flav_XPD_), 198(Ga0224415_10007274), 237 (Ga0187911_10069260), 233(Ga0187910_10015336), 253 (ODAI_chimera), 214 (31009_GL0034153), 224(DLF014_GL0011914_), 127 (CasR_R._albus), 235 (Ga0187910_10040531), 153(tpg|DJXD01000002.1), 125 (CasR_E._siraeum), 245 (PIG-018_GL0023397),247 (PIG-025_GL0099734), 204 (Ga0129317_1008067_chimera), 226(EYZ-362B_GL0088915), 3 (uncultured_Ru_sp), 126 (R._flav._FD-1), and 149(tpg|DBYI01000091.1), from top to bottom.

SEQUENCE LISTING

The nucleic and amino acid sequences listed in the accompanying sequencelisting are shown using standard letter abbreviations for nucleotidebases, and three letter code for amino acids, as defined in 37 C.F.R.1.822. Only one strand of each nucleic acid sequence is shown, but thecomplementary strand is understood as included by any reference to thedisplayed strand.

SEQ ID NO: 1 is an exemplary Cas13d sequence from Eubacterium siraeumcontaining a HEPN site.

SEQ ID NO: 2 is an exemplary Cas13d sequence from Eubacterium siraeumcontaining a mutated HEPN site.

SEQ ID NO: 3 is an exemplary Cas13d sequence from unculturedRuminococcus sp. containing a HEPN site.

SEQ ID NO: 4 is an exemplary Cas13d sequence from unculturedRuminococcus sp. containing a mutated HEPN site.

SEQ ID NO: 5 is an exemplary Cas13d sequence fromGut_metagenome_contig2791000549.

SEQ ID NO: 6 is an exemplary Cas13d sequence fromGut_metagenome_contig855000317.

SEQ ID NO: 7 is an exemplary Cas13d sequence fromGut_metagenome_contig3389000027.

SEQ ID NO: 8 is an exemplary Cas13d sequence fromGut_metagenome_contig8061000170.

SEQ ID NO: 9 is an exemplary Cas13d sequence fromGut_metagenome_contig1509000299.

SEQ ID NO: 10 is an exemplary Cas13d sequence fromGut_metagenome_contig9549000591.

SEQ ID NO: 11 is an exemplary Cas13d sequence fromGut_metagenome_contig71000500.

SEQ ID NO: 12 is an exemplary Cas13d sequence from human gut metagenome.

SEQ ID NO: 13 is an exemplary Cas13d sequence fromGut_metagenome_contig3915000357.

SEQ ID NO: 14 is an exemplary Cas13d sequence fromGut_metagenome_contig4719000173.

SEQ ID NO: 15 is an exemplary Cas13d sequence fromGut_metagenome_contig6929000468.

SEQ ID NO: 16 is an exemplary Cas13d sequence fromGut_metagenome_contig7367000486.

SEQ ID NO: 17 is an exemplary Cas13d sequence fromGut_metagenome_contig7930000403.

SEQ ID NO: 18 is an exemplary Cas13d sequence fromGut_metagenome_contig993000527.

SEQ ID NO: 19 is an exemplary Cas13d sequence fromGut_metagenome_contig6552000639.

SEQ ID NO: 20 is an exemplary Cas13d sequence fromGut_metagenome_contig11932000246.

SEQ ID NO: 21 is an exemplary Cas13d sequence fromGut_metagenome_contig12963000286.

SEQ ID NO: 22 is an exemplary Cas13d sequence fromGut_metagenome_contig2952000470.

SEQ ID NO: 23 is an exemplary Cas13d sequence fromGut_metagenome_contig451000394.

SEQ ID NO: 24 is an exemplary Cas13d sequence fromEubacterium_siraeum_DSM_15702.

SEQ ID NO: 25 is an exemplary Cas13d sequence fromgut_metagenome_P19E0k2120140920,c369000003.

SEQ ID NO: 26 is an exemplary Cas13d sequence fromGut_metagenome_contig7593000362.

SEQ ID NO: 27 is an exemplary Cas13d sequence fromGut_metagenome_contig12619000055.

SEQ ID NO: 28 is an exemplary Cas13d sequence fromGut_metagenome_contig1405000151.

SEQ ID NO: 29 is an exemplary Cas13d sequence fromChicken_gut_metagenome_c298474.

SEQ ID NO: 30 is an exemplary Cas13d sequence fromGut_metagenome_contig1516000227.

SEQ ID NO: 31 is an exemplary Cas13d sequence fromGut_metagenome_contig1838000319.

SEQ ID NO: 32 is an exemplary Cas13d sequence fromGut_metagenome_contig13123000268.

SEQ ID NO: 33 is an exemplary Cas13d sequence fromGut_metagenome_contig5294000434.

SEQ ID NO: 34 is an exemplary Cas13d sequence fromGut_metagenome_contig6415000192.

SEQ ID NO: 35 is an exemplary Cas13d sequence fromGut_metagenome_contig6144000300.

SEQ ID NO: 36 is an exemplary Cas13d sequence fromGut_metagenome_contig9118000041.

SEQ ID NO: 37 is an exemplary Cas13d sequence fromActivated_sludge_metagenome_transcript_124486.

SEQ ID NO: 38 is an exemplary Cas13d sequence fromGut_metagenome_contig1322000437.

SEQ ID NO: 39 is an exemplary Cas13d sequence fromGut_metagenome_contig4582000531.

SEQ ID NO: 40 is an exemplary Cas13d sequence fromGut_metagenome_contig9190000283.

SEQ ID NO: 41 is an exemplary Cas13d sequence fromGut_metagenome_contig1709000510.

SEQ ID NO: 42 is an exemplary Cas13d sequence fromM24_(LSQX01212483_Anaerobic_digester_metagenome) with a HEPN domain SEQID NO: 43 is an exemplary Cas13d sequence fromGut_metagenome_contig3833000494.

SEQ ID NO: 44 is an exemplary Cas13d sequence fromActivated_sludge_metagenome_transcript_117355.

SEQ ID NO: 45 is an exemplary Cas13d sequence fromGut_metagenome_contig11061000330.

SEQ ID NO: 46 is an exemplary Cas13d sequence fromGut_metagenome_contig338000322 from sheep gut metagenome.

SEQ ID NO: 47 is an exemplary Cas13d sequence from human gut metagenome.

SEQ ID NO: 48 is an exemplary Cas13d sequence fromGut_metagenome_contig9530000097.

SEQ ID NO: 49 is an exemplary Cas13d sequence fromGut_metagenome_contig1750000258.

SEQ ID NO: 50 is an exemplary Cas13d sequence fromGut_metagenome_contig5377000274.

SEQ ID NO: 51 is an exemplary Cas13d sequence fromgut_metagenome_P19E0k2120140920_c248000089.

SEQ ID NO: 52 is an exemplary Cas13d sequence fromGut_metagenome_contig11400000031.

SEQ ID NO: 53 is an exemplary Cas13d sequence fromGut_metagenome_contig7940000191.

SEQ ID NO: 54 is an exemplary Cas13d sequence fromGut_metagenome_contig6049000251.

SEQ ID NO: 55 is an exemplary Cas13d sequence fromGut_metagenome_contig1137000500.

SEQ ID NO: 56 is an exemplary Cas13d sequence fromGut_metagenome_contig9368000105.

SEQ ID NO: 57 is an exemplary Cas13d sequence fromGut_metagenome_contig546000275.

SEQ ID NO: 58 is an exemplary Cas13d sequence fromGut_metagenome_contig7216000573.

SEQ ID NO: 59 is an exemplary Cas13d sequence fromGut_metagenome_contig4806000409.

SEQ ID NO: 60 is an exemplary Cas13d sequence fromGut_metagenome_contig10762000480.

SEQ ID NO: 61 is an exemplary Cas13d sequence fromGut_metagenome_contig4114000374.

SEQ ID NO: 62 is an exemplary Cas13d sequence fromRuminococcus_flavefaciens_FD1.

SEQ ID NO: 63 is an exemplary Cas13d sequence fromGut_metagenome_contig7093000170.

SEQ ID NO: 64 is an exemplary Cas13d sequence fromGut_metagenome_contig11113000384.

SEQ ID NO: 65 is an exemplary Cas13d sequence fromGut_metagenome_contig6403000259.

SEQ ID NO: 66 is an exemplary Cas13d sequence fromGut_metagenome_contig6193000124.

SEQ ID NO: 67 is an exemplary Cas13d sequence fromGut_metagenome_contig721000619.

SEQ ID NO: 68 is an exemplary Cas13d sequence fromGut_metagenome_contig1666000270.

SEQ ID NO: 69 is an exemplary Cas13d sequence fromGut_metagenome_contig2002000411.

SEQ ID NO: 70 is an exemplary Cas13d sequence from Ruminococcus_albus.

SEQ ID NO: 71 is an exemplary Cas13d sequence fromGut_metagenome_contig13552000311.

SEQ ID NO: 72 is an exemplary Cas13d sequence fromGut_metagenome_contig10037000527.

SEQ ID NO: 73 is an exemplary Cas13d sequence fromGut_metagenome_contig238000329.

SEQ ID NO: 74 is an exemplary Cas13d sequence fromGut_metagenome_contig2643000492.

SEQ ID NO: 75 is an exemplary Cas13d sequence fromGut_metagenome_contig874000057.

SEQ ID NO: 76 is an exemplary Cas13d sequence fromGut_metagenome_contig4781000489.

SEQ ID NO: 77 is an exemplary Cas13d sequence fromGut_metagenome_contig12144000352.

SEQ ID NO: 78 is an exemplary Cas13d sequence fromGut_metagenome_contig5590000448.

SEQ ID NO: 79 is an exemplary Cas13d sequence fromGut_metagenome_contig9269000031.

SEQ ID NO: 80 is an exemplary Cas13d sequence fromGut_metagenome_contig8537000520.

SEQ ID NO: 81 is an exemplary Cas13d sequence fromGut_metagenome_contig1845000130.

SEQ ID NO: 82 is an exemplary Cas13d sequence fromgut_metagenome_P13E0k2120140920_c3000072.

SEQ ID NO: 83 is an exemplary Cas13d sequence fromgut_metagenome_P1E0k2120140920_c1000078.

SEQ ID NO: 84 is an exemplary Cas13d sequence fromGut_metagenome_contig12990000099.

SEQ ID NO: 85 is an exemplary Cas13d sequence fromGut_metagenome_contig525000349.

SEQ ID NO: 86 is an exemplary Cas13d sequence fromGut_metagenome_contig7229000302.

SEQ ID NO: 87 is an exemplary Cas13d sequence fromGut_metagenome_contig3227000343.

SEQ ID NO: 88 is an exemplary Cas13d sequence fromGut_metagenome_contig7030000469.

SEQ ID NO: 89 is an exemplary Cas13d sequence fromGut_metagenome_contig5149000068.

SEQ ID NO: 90 is an exemplary Cas13d sequence fromGut_metagenome_contig400200045.

SEQ ID NO: 91 is an exemplary Cas13d sequence fromGut_metagenome_contig10420000446.

SEQ ID NO: 92 is an exemplary Cas13d sequence fromnew_flavefaciens,strain_XPD3002.

SEQ ID NO: 93 is an exemplary Cas13d sequence fromM26_Gut_metagenome_contig698000307.

SEQ ID NO: 94 is an exemplary Cas13d sequence fromM36_Uncultured_Eubacterium_sp_TS28_c40956.

SEQ ID NO: 95 is an exemplary Cas13d sequence fromM12_gut_metagenome_P25C0k2120140920_c134000066.

SEQ ID NO: 96 is an exemplary Cas13d sequence from human gut metagenome.

SEQ ID NO: 97 is an exemplary Cas13d sequence fromM10_gut_metagenome_P25C90k2120140920,_c28000041.

SEQ ID NO: 98 is an exemplary Cas13d sequence fromM11_gut_metagenome_P25C7k2120140920_c4078000105.

SEQ ID NO: 99 is an exemplary Cas13d sequence fromgut_metagenome_P25C0k2120140920_c32000045.

SEQ ID NO: 100 is an exemplary Cas13d sequence fromM13_gut_metagenome_P23C7k2120140920_c3000067.

SEQ ID NO: 101 is an exemplary Cas13d sequence fromM5_gut_metagenome_P18E90k2120140920.

SEQ ID NO: 102 is an exemplary Cas13d sequence fromM21_gut_metagenome_P18E0k2120140920.

SEQ ID NO: 103 is an exemplary Cas13d sequence fromM7_gut_metagenome_P38C7k2120140920_c4841000003.

SEQ ID NO: 104 is an exemplary Cas13d sequence fromRuminococcus_bicirculans.

SEQ ID NO: 105 is an exemplary Cas13d sequence.

SEQ ID NO: 106 is an exemplary Cas13d consensus sequence.

SEQ ID NO: 107 is an exemplary Cas13d sequence fromM18_gut_metagenome_P22E0k2120140920_c3395000078.

SEQ ID NO: 108 is an exemplary Cas13d sequence fromM17_gut_metagenome_P22E90k2120140920_c114.

SEQ ID NO: 109 is an exemplary Cas13d sequence fromRuminococcus_sp_CAG57.

SEQ ID NO: 110 is an exemplary Cas13d sequence fromgut_metagenome_P11E90k2120140920_c43000123.

SEQ ID NO: 111 is an exemplary Cas13d sequence fromM6_gut_metagenome_P13E90k2120140920_c7000009.

SEQ ID NO: 112 is an exemplary Cas13d sequence fromM19_gut_metagenome_P17E90k2120140920.

SEQ ID NO: 113 is an exemplary Cas13d sequence fromgut_metagenome_P17E0k2120140920,_c 87000043.

SEQ ID NO: 114 is an exemplary human codon optimized Eubacterium siraeumCas13d nucleic acid sequence.

SEQ ID NO: 115 is an exemplary human codon optimized Eubacterium siraeumCas13d nucleic acid sequence with a mutant HEPN domain.

SEQ ID NO: 116 is an exemplary human codon-optimized Eubacterium siraeumCas13d nucleic acid sequence with N-terminal NLS.

SEQ ID NO: 117 is an exemplary human codon-optimized Eubacterium siraeumCas13d nucleic acid sequence with N- and C-terminal NLS tags.

SEQ ID NO: 118 is an exemplary human codon-optimized unculturedRuminococcus sp. Cas13d nucleic acid sequence.

SEQ ID NO: 119 is an exemplary human codon-optimized unculturedRuminococcus sp. Cas13d nucleic acid sequence with a mutant HEPN domain.

SEQ ID NO: 120 is an exemplary human codon-optimized unculturedRuminococcus sp. Cas13d nucleic acid sequence with N-terminal NLS.

SEQ ID NO: 121 is an exemplary human codon-optimized unculturedRuminococcus sp. Cas13d nucleic acid sequence with N- and C-terminal NLStags.

SEQ ID NO: 122 is an exemplary human codon-optimized unculturedRuminococcus flavefaciens FD1 Cas13d nucleic acid sequence.

SEQ ID NO: 123 is an exemplary human codon-optimized unculturedRuminococcus flavefaciens FD1 Cas13d nucleic acid sequence with mutatedHEPN domain.

SEQ ID NO: 124 is an exemplary Cas13d nucleic acid sequence fromRuminococcus bicirculans.

SEQ ID NO: 125 is an exemplary Cas13d nucleic acid sequence fromEubacterium siraeum.

SEQ ID NO: 126 is an exemplary Cas13d nucleic acid sequence fromRuminococcus flavefaciens FD1.

SEQ ID NO: 127 is an exemplary Cas13d nucleic acid sequence fromRuminococcus albus.

SEQ ID NO: 128 is an exemplary Cas13d nucleic acid sequence fromRuminococcus flavefaciens XPD.

SEQ ID NO: 129 is an exemplary consensus DR nucleic acid sequence for E.siraeum Cas13d.

SEQ ID NO: 130 is an exemplary consensus DR nucleic acid sequence forRum. Sp. Cas13d.

SEQ ID NO: 131 is an exemplary consensus DR nucleic acid sequence forRum. Flavefaciens strain XPD3002 Cas13d and CasRx.

SEQ ID NOS: 132-137 are exemplary consensus DR nucleic acid sequences.

SEQ ID NO: 138 is an exemplary 50% consensus sequence for sevenfull-length Cas13d orthologues.

SEQ ID NO: 139 is an exemplary Cas13d nucleic acid sequence from Gutmetagenome P1E0.

SEQ ID NO: 140 is an exemplary Cas13d nucleic acid sequence fromAnaerobic digester.

SEQ ID NO: 141 is an exemplary Cas13d nucleic acid sequence fromRuminococcus sp. CAG:57.

SEQ ID NO: 142 is an exemplary human codon-optimized uncultured Gutmetagenome P1E0 Cas13d nucleic acid sequence.

SEQ ID NO: 143 is an exemplary human codon-optimized Anaerobic DigesterCas13d nucleic acid sequence.

SEQ ID NO: 144 is an exemplary human codon-optimized Ruminococcusflavefaciens XPD Cas13d nucleic acid sequence.

SEQ ID NO: 145 is an exemplary human codon-optimized Ruminococcus albusCas13d nucleic acid sequence.

SEQ ID NO: 146 is an exemplary processing of the Ruminococcus sp. CAG:57CRISPR array.

SEQ ID NO: 147 is an exemplary Cas13d protein sequence from contigemb|OBVH01003037.1, human gut metagenome sequence (also found in WGScontigs emb|OBXZ01000094.1| and emb|OBJF01000033.1.

SEQ ID NO: 148 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 147).

SEQ ID NO: 149 is an exemplary Cas13d protein sequence from contigtpg|DBYI01000091.1| (Uncultivated Ruminococcus flavefaciens UBA1190assembled from bovine gut metagenome).

SEQ ID NOS: 150-152 are exemplary consensus DR nucleic acid sequences(goes with SEQ ID NO: 149).

SEQ ID NO: 153 is an exemplary Cas13d protein sequence from contigtpg|DJXD01000002.1| (uncultivated Ruminococcus assembly, UBA7013, fromsheep gut metagenome).

SEQ ID NO: 154 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 153).

SEQ ID NO: 155 is an exemplary Cas13d protein sequence from contigOGZC01000639.1 (human gut metagenome assembly).

SEQ ID NOS: 156-177 are exemplary consensus DR nucleic acid sequences(goes with SEQ ID NO: 155).

SEQ ID NO: 158 is an exemplary Cas13d protein sequence from contigemb|OHBM01000764.1 (human gut metagenome assembly).

SEQ ID NO: 159 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 158).

SEQ ID NO: 160 is an exemplary Cas13d protein sequence from contigemb|OHCP01000044.1 (human gut metagenome assembly).

SEQ ID NO: 161 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 160).

SEQ ID NO: 162 is an exemplary Cas13d protein sequence from contigemb|OGDF01008514.1| (human gut metagenome assembly).

SEQ ID NO: 163 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 162).

SEQ ID NO: 164 is an exemplary Cas13d protein sequence from contigemb|OGPN01002610.1 (human gut metagenome assembly).

SEQ ID NO: 165 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 164).

SEQ ID NO: 166 is an exemplary Cas13d protein sequence from contigNFIR01000008.1 (Eubacterium sp. An3, from chicken gut metagenome).

SEQ ID NO: 167 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 166).

SEQ ID NO: 168 is an exemplary Cas13d protein sequence from contigNFLV01000009.1 (Eubacterium sp. An11. from chicken gut metagenome).

SEQ ID NO: 169 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 168).

SEQ ID NOS: 171-174 are an exemplary Cas13d motif sequences.

SEQ ID NO: 175 is an exemplary Cas13d protein sequence from contigOJMM01002900 human gut metagenome sequence.

SEQ ID NO: 176 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 175).

SEQ ID NO: 177 is an exemplary Cas13d protein sequence from contigODAI011611274.1 gut metagenome sequence.

SEQ ID NO: 178 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 177).

SEQ ID NO: 179 is an exemplary Cas13d protein sequence from contigOIZX01000427.1.

SEQ ID NO: 180 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 179).

SEQ ID NO: 181 is an exemplary Cas13d protein sequence from contigemb|OCVV012889144.1.

SEQ ID NO: 182 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 181).

SEQ ID NO: 183 is an exemplary Cas13d protein sequence from contigOCTW011587266.1

SEQ ID NO: 184 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 183).

SEQ ID NO: 185 is an exemplary Cas13d protein sequence from contigemb|OGNF01009141.1.

SEQ ID NO: 186 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 185).

SEQ ID NO: 187 is an exemplary Cas13d protein sequence from contigemb|OIEN01002196.1.

SEQ ID NO: 188 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 187).

SEQ ID NO: 189 is an exemplary Cas13d protein sequence from contige-k87_11092736.

SEQ ID NO: 190-193 are exemplary consensus DR nucleic acid sequences(goes with SEQ ID NO: 189).

SEQ ID NO: 194 is an exemplary Cas13d sequence fromGut_metagenome_contig6893000291.

SEQ ID NO: 195 is an exemplary Cas13d motif sequences.

SEQ ID NO: 198 is an exemplary Cas13d protein sequence fromGa0224415_10007274.

SEQ ID NO: 199 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 198).

SEQ ID NO: 200 is an exemplary Cas13d protein sequence fromEMG_10003641.

SEQ ID NO: 201 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 200).

SEQ ID NO: 202 is an exemplary Cas13d protein sequence fromGa0129306_1000735.

SEQ ID NO: 203 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 202).

SEQ ID NO: 204 is an exemplary Cas13d protein sequence fromGa0129317_1008067.

SEQ ID NO: 205 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 204).

SEQ ID NO: 206 is an exemplary Cas13d protein sequence fromGa0224415_10048792.

SEQ ID NO: 207 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 206).

SEQ ID NO: 208 is an exemplary Cas13d protein sequence from160582958_gene49834.

SEQ ID NO: 209 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 208).

SEQ ID NO: 210 is an exemplary Cas13d protein sequence from250twins_35838_GL0110300.

SEQ ID NO: 211 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 210).

SEQ ID NO: 212 is an exemplary Cas13d protein sequence from250twins_36050_GL0158985.

SEQ ID NO: 213 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 212).

SEQ ID NO: 214 is an exemplary Cas13d protein sequence from31009_GL0034153.

SEQ ID NO: 215 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 214).

SEQ ID NO: 216 is an exemplary Cas13d protein sequence from530373_GL0023589.

SEQ ID NO: 217 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 216).

SEQ ID NO: 218 is an exemplary Cas13d protein sequence fromBMZ-11B_GL0037771.

SEQ ID NO: 219 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 218).

SEQ ID NO: 220 is an exemplary Cas13d protein sequence fromBMZ-11B_GL0037915.

SEQ ID NO: 221 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 220).

SEQ ID NO: 222 is an exemplary Cas13d protein sequence fromBMZ-11B_GL0069617.

SEQ ID NO: 223 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 222).

SEQ ID NO: 224 is an exemplary Cas13d protein sequencefrom—DLF014_GL0011914.

SEQ ID NO: 225 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 224).

SEQ ID NO: 226 is an exemplary Cas13d protein sequence fromEYZ-362B_GL0088915.

SEQ ID NO: 227-228 are exemplary consensus DR nucleic acid sequences(goes with SEQ ID NO: 226).

SEQ ID NO: 229 is an exemplary Cas13d protein sequence fromGa0099364_10024192.

SEQ ID NO: 230 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 229).

SEQ ID NO: 231 is an exemplary Cas13d protein sequence fromGa0187910_10006931.

SEQ ID NO: 232 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 231).

SEQ ID NO: 233 is an exemplary Cas13d protein sequence fromGa0187910_10015336.

SEQ ID NO: 234 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 233).

SEQ ID NO: 235 is an exemplary Cas13d protein sequence fromGa0187910_10040531.

SEQ ID NO: 236 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 235).

SEQ ID NO: 237 is an exemplary Cas13d protein sequence fromGa0187911_10069260.

SEQ ID NO: 238 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 237).

SEQ ID NO: 239 is an exemplary Cas13d protein sequence fromMH0288_GL0082219.

SEQ ID NO: 240 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 239).

SEQ ID NO: 241 is an exemplary Cas13d protein sequence from02.UC29-0_GL0096317.

SEQ ID NO: 242 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 241).

SEQ ID NO: 243 is an exemplary Cas13d protein sequence fromPIG-014_GL0226364.

SEQ ID NO: 244 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 243).

SEQ ID NO: 245 is an exemplary Cas13d protein sequence fromPIG-018_GL0023397.

SEQ ID NO: 246 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 245).

SEQ ID NO: 247 is an exemplary Cas13d protein sequence fromPIG-025_GL0099734.

SEQ ID NO: 248 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 247).

SEQ ID NO: 249 is an exemplary Cas13d protein sequence fromPIG-028_GL0185479.

SEQ ID NO: 250 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 249).

SEQ ID NO: 251 is an exemplary Cas13d protein sequencefrom—Ga0224422_10645759.

SEQ ID NO: 252 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 251).

SEQ ID NO: 253 is an exemplary Cas13d protein sequence from ODAIchimera.

SEQ ID NO: 254 is an exemplary consensus DR nucleic acid sequence (goeswith SEQ ID NO: 253).

SEQ ID NOs: 256 and 257 are exemplary Cas13d nuclear localization signalamino acid and nucleic acid sequences, respectively.

SEQ ID NOs: 258 and 260 are exemplary SV40 large T antigen nuclearlocalization signal amino acid and nucleic acid sequences, respectively.

SEQ ID NO: 259 is a dCas9 target sequence.

SEQ ID NO: 261 is an artificial Eubacterium siraeum nCas1 arraytargeting ccdB.

SEQ ID NO: 262 is a full 36 nt direct repeat.

SEQ ID NOs: 263-266 are spacer sequences.

SEQ ID NO: 267 is an artificial uncultured Ruminoccus sp. nCas1 arraytargeting ccdB.

SEQ ID NO: 268 is a full 36 nt direct repeat.

SEQ ID NOs: 269-272 are spacer sequences.

SEQ ID NO: 273 is a ccdB target RNA sequence.

SEQ ID NOs: 274-277 are spacer sequences.

SEQ ID NO: 278 is a gRNA sequence.

SEQ ID NO: 279 is a consensus sequence of HEPN1 domain region.

SEQ ID NO: 280-286 are HEPN1 domain regions from seven Cas13d proteins.

SEQ ID NO: 287 is a consensus sequence of HEPN2 domain region.

SEQ ID NO: 288-294 are HEPN2 domain regions from seven Cas13d proteins.

SEQ ID NO: 295 is an exemplary RNA target sequence.

SEQ ID NO: 296-309 are exemplary gRNA sequences with varioustruncations.

SEQ ID NO: 310 is an exemplary Cas13d protein sequence from037_-_emb|OIZA01000315.1.

SEQ ID NO: 311 is an exemplary Cas13d protein sequence fromPIG-022_GL0026351.

SEQ ID NO: 312 is an exemplary Cas13d protein sequence fromPIG-046_GL0077813.

SEQ ID NO: 313 is an exemplary Cas13d protein sequence from pig_chimera.

DETAILED DESCRIPTION

Unless otherwise noted, technical terms are used according toconventional usage. Definitions of common terms in molecular biology canbe found in Benjamin Lewin, Genes VII, published by Oxford UniversityPress, 1999; Kendrew et al. (eds.), The Encyclopedia of MolecularBiology, published by Blackwell Science Ltd., 1994; and Robert A. Meyers(ed.), Molecular Biology and Biotechnology: a Comprehensive DeskReference, published by VCH Publishers, Inc., 1995; and other similarreferences.

As used herein, the singular forms “a,” “an,” and “the,” refer to boththe singular as well as plural, unless the context clearly indicatesotherwise. As used herein, the term “comprises” means “includes.” Thus,“comprising a nucleic acid molecule” means “including a nucleic acidmolecule” without excluding other elements. It is further to beunderstood that any and all base sizes given for nucleic acids areapproximate, and are provided for descriptive purposes, unless otherwiseindicated. Although many methods and materials similar or equivalent tothose described herein can be used, particular suitable methods andmaterials are described below. In case of conflict, the presentspecification, including explanations of terms, will control. Inaddition, the materials, methods, and examples are illustrative only andnot intended to be limiting. All references, including patentapplications and patents, are herein incorporated by reference in theirentireties.

In order to facilitate review of the various embodiments of thedisclosure, the following explanations of specific terms are provided:

I. Terms

Administration: To provide or give a subject an agent, such as an Cas13dprotein (or Cas13d coding sequence) or guide molecule (or codingsequence) disclosed herein, by any effective route. Exemplary routes ofadministration include, but are not limited to, injection (such assubcutaneous, intramuscular, intradermal, intraperitoneal, intratumoral,and intravenous), transdermal, intranasal, and inhalation routes.

Cas13d (also referred to as CasR, for CRISPR-associated RNase, andCas13d): An RNA-guided RNA endonuclease enzyme that can cut or bind RNA.Cas13d proteins include one or two HEPN domains (e.g., see SEQ ID NOS:1-3, 42, 62, 70, 82, 83, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138, 147, 149,153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181, 183,185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216,218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245,247, 249, 251, and 253). Native HEPN domains include the sequenceRXXXXH. Cas13d proteins that include mutated HEPN domain(s), and thuscannot cut RNA, but can process guide RNA, are also encompassed by thisdisclosure (e.g., see SEQ ID NOS: 2 and 4). An alignment of nativeCas13d proteins is shown in FIGS. 18A-18MMM. In addition, Cas13dproteins specifically recognize direct repeat sequences of gRNA having aparticular secondary structure (e.g., see FIG. 3C). In one example,Cas13d proteins recognize and/or bind a DR having (1) a loop of about 4to 8 nt, (2) a stem of 4 to 12 nt, stem formed of complementarynucleotides, which can include a small (e.g., 1 or 2 bp) bulge due to ant mismatch in the stem, and (3) a bulge or overhang formed of unpairednt, which can be about 10 to 14 nt (e.g., 5 to 7 to on either side).

In one example, the full length (non-truncated) Cas13d protein isbetween 870-1080 amino acids long. In one example, the Cas13d protein isderived from a genome sequence of a bacterium from the OrderClostridiales or a metagenomic sequence. In one example, thecorresponding DR sequence of an Cas13d protein is located at the 5′ endof the spacer sequence in the molecule that includes the Cas13d gRNA. Inone example, the DR sequence in the Cas13d gRNA is truncated at the 5′end relative to the DR sequence in the unprocessed Cas13d guide arraytranscript. In one example, the DR sequence in the Cas13d gRNA istruncated by 5-7 nt at the 5′ end by the Cas13d protein. In one example,the Cas13d protein can cut a target RNA flanked at the 3′ end of thespacer-target duplex by any of a A, U, G or C ribonucleotide and flankedat the 5′ end by any of a A, U, G or C ribonucleotide.

In one example, an Cas13d protein has at least 80%, at least 85%, atleast 90%, at least 91%, at least 92%, at least 95%, at least 96%, atleast 97%, at least 98% or at least 99% sequence identity to SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164,166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200,202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253.

In one example, an Cas13d protein contains two HEPN Rnase domains whichcontain a RXXXXH amino acid motif (where X indicates any amino acid). Inaddition, an Cas13d protein can include one or more of the followingamino acid motifs written in the common Prosite format:

Motif 1: (SEQ ID NO: 195) L-x(5)-[FWY]-x(3)-K-[NQS]-[ILM]-[ILMV]-x(2)-N-x(2)-[FWY]-x(2)-[AG]-x(4)-[DE]-x-D Motif 2:[FWY]-[ILV]-x(2)-[NQS]-[ILV]-x(2)-[DNST]-x(2)-F-x-Y-x(2)-[HKR]-[FHY] (Motif 2) Motif 3:Y-[CDNSV]-x(2)-R-[FWY]-x-[ADNT]-[LM]-[ST]-x(4)- [FWY] (Motif 3)

Thus, in some examples, an Cas13d protein having at least 80%, at least85%, at least 90%, at least 91%, at least 92%, at least 95%, at least96%, at least 97%, at least 98% or at least 99% sequence identity to SEQID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,107, 108, 109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160,162, 164, 166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194,198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224,226, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253,includes the motif of SEQ ID NO: 195, Motif 2, or Motif 3.

Complementarity: The ability of a nucleic acid to form hydrogen bond(s)with another nucleic acid sequence by either traditional Watson-Crickbase pairing or other non-traditional types. A percent complementarityindicates the percentage of residues in a nucleic acid molecule whichcan form hydrogen bonds (e.g., Watson-Crick base pairing) with a secondnucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%,70%, 80%, 90%, and 100% complementary). “Perfectly complementary” meansthat all the contiguous residues of a nucleic acid sequence willhydrogen bond with the same number of contiguous residues in a secondnucleic acid sequence. “Substantially complementary” as used hereinrefers to a degree of complementarity that is at least 60%, 65%, 70%,75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35,40, 45, 50, or more nucleotides, or refers to two nucleic acids thathybridize under stringent conditions.

CRISPRs (Clustered Regularly InterSpaced Repeats): The CRISPR RNA arrayis a defining feature of CRISPR systems. The term “CRISPR” refers to thearchitecture of the array which includes constant direct repeats (DRs)interspaced with the variable spacers. In some examples, a CRISPR arrayincludes at least a DR-spacer-DR-spacer (see FIG. 1A). This feature wasused in the disclosed computational pipeline to identify the novelCas13d protein family (FIG. 1A). In bacteria, the array is transcribedas one single transcript (containing multiple crRNA units), which isthen processed by the Cas13d protein and other RNases into individualcrRNAs. CRISPRs are found in approximately 40% of sequenced bacteriagenomes and 90% of sequenced archaea. CRISPRs are often associated withcas genes that code for proteins related to CRISPRs (such as the Cas13dproteins provided herein). The disclosed CRISPR/Cas system can be usedfor RNA targeting, for example to detect a target RNA, modify a targetRNA at any desired location, or cut the target RNA at any desiredlocation.

Downregulated or knocked down: When used in reference to the expressionof a molecule, such as a target RNA, refers to any process which resultsin a decrease in production of the target RNA, but in some examples notcomplete elimination of the target RNA product or target RNA function.In one example, downregulation or knock down does not result in completeelimination of detectable target RNA expression or target RNA activity.In some examples, the target RNA is a coding RNA. In some examples, thetarget RNA is non-coding RNA. Specific examples of RNA molecules thatcan be targeted for downregulation include mRNA, miRNA, rRNA, tRNA,nuclear RNA, lincRNA, circular RNA, and structural RNA. In someexamples, downregulation or knock down of a target RNA includesprocesses that decrease translation of the target RNA and thus candecrease the presence of corresponding proteins. The disclosedCRISPR/Cas system can be used to downregulate any target RNA ofinterest.

Downregulation or knock down includes any detectable decrease in thetarget RNA. In certain examples, detectable target RNA in a cell or cellfree system decreases by at least 10%, at least 20%, at least 30%, atleast 40%, at least 50%, at least 60%, at least 70%, at least 75%, atleast 80%, at least 90%, at least 95%, at least 96%, at least 97%, atleast 98%, or at least 99% (such as a decrease of 40% to 90%, 40% to 80%or 50% to 95%) as compared to a control (such an amount of target RNAdetected in a corresponding normal cell or sample). In one example, acontrol is a relative amount of expression in a normal cell (e.g., anon-recombinant cell that does not include Cas13d or guide RNA).

Effective amount: The amount of an agent (such as the CRISPR/Cas agentsprovided herein) that is sufficient to effect beneficial or desiredresults.

A therapeutically effective amount may vary depending upon one or moreof: the subject and disease condition being treated, the weight and ageof the subject, the severity of the disease condition, the manner ofadministration and the like, which can readily be determined by one ofordinary skill in the art. The beneficial therapeutic effect can includeenablement of diagnostic determinations; amelioration of a disease,symptom, disorder, or pathological condition; reducing or preventing theonset of a disease, symptom, disorder or condition; and generallycounteracting a disease, symptom, disorder or pathological condition. Inone embodiment, an “effective amount” is an amount sufficient to reducesymptoms of a disease, for example by at least 10%, at least 20%, atleast 50%, at least 70%, or at least 90% (as compared to noadministration of the therapeutic agent).

The term also applies to a dose that will allow for expression of anCas13d and/or gRNA herein, and that allows for targeting (e.g.,detection or modification) of a target RNA.

Increase or Decrease: A statistically significant positive or negativechange, respectively, in quantity from a control value. An increase is apositive change, such as an increase at least 50%, at least 100%, atleast 200%, at least 300%, at least 400% or at least 500% as compared tothe control value. A decrease is a negative change, such as a decreaseof at least 20%, at least 25%, at least 50%, at least 75%, at least 80%,at least 90%, at least 95%, at least 98%, at least 99%, or at least 100%decrease as compared to a control value. In some examples the decreaseis less than 100%, such as a decrease of no more than 90%, no more than95% or no more than 99%.

Isolated: An “isolated” biological component (such as an Cas13d proteinor nucleic acid, gRNA, or cell containing such) has been substantiallyseparated, produced apart from, or purified away from other biologicalcomponents in the cell or tissue of an organism in which the componentoccurs, such as other cells, chromosomal and extrachromosomal DNA andRNA, and proteins. Nucleic acids and proteins that have been “isolated”include nucleic acids and proteins purified by standard purificationmethods. The term also embraces nucleic acids and proteins prepared byrecombinant expression in a host cell as well as chemically synthesizednucleic acids and proteins. Isolated Cas13d proteins or nucleic acids,or cells containing such, in some examples are at least 50% pure, suchas at least 75%, at least 80%, at least 90%, at least 95%, at least 98%,or at least 100% pure.

Label: A compound or composition that is conjugated directly orindirectly to another molecule (such as a nucleic acid molecule) tofacilitate detection of that molecule. Specific, non-limiting examplesof labels include fluorescent and fluorogenic moieties, chromogenicmoieties, haptens, affinity tags, and radioactive isotopes. The labelcan be directly detectable (e.g., optically detectable) or indirectlydetectable (for example, via interaction with one or more additionalmolecules that are in turn detectable).

Modulate: A change in the content of RNA. Modulation can include, but isnot limited to, RNA activation (e.g., upregulation), RNA repression(e.g., downregulation), ribonucleotide deletion, ribonucleotideinsertion, ribonucleotide chemical modification, ribonucleotide covalentor non-covalent linkage, and/or ribonucleotide substitution.

Non-naturally occurring or engineered: Terms used herein asinterchangeably and indicate the involvement of the hand of man. Theterms, when referring to nucleic acid molecules or polypeptides indicatethat the nucleic acid molecule or the polypeptide is at leastsubstantially free from at least one other component with which they arenaturally associated in nature and as found in nature. In addition, theterms can indicate that the nucleic acid molecules or polypeptides havea sequence not found in nature.

Operably linked: A first nucleic acid sequence is operably linked with asecond nucleic acid sequence when the first nucleic acid sequence isplaced in a functional relationship with the second nucleic acidsequence. For instance, a promoter is operably linked to a codingsequence (such as a coding sequence of an Cas13d protein) if thepromoter affects the transcription or expression of the coding sequence.Generally, operably linked DNA sequences are contiguous and, wherenecessary to join two protein-coding regions, in the same reading frame.

Pharmaceutically acceptable carriers: The pharmaceutically acceptablecarriers useful in this invention are conventional. Remington'sPharmaceutical Sciences, by E. W. Martin, Mack Publishing Co., Easton,PA, 15th Edition (1975), describes compositions and formulationssuitable for pharmaceutical delivery of an Cas13d protein or nucleicacid molecule (or other molecules needed for modifying RNA using thedisclosed CRISPR/Cas system with the disclosed Cas13d proteins).

In general, the nature of the carrier will depend on the particular modeof administration being employed. For instance, parenteral formulationsusually comprise injectable fluids that include pharmaceutically andphysiologically acceptable fluids such as water, physiological saline,balanced salt solutions, aqueous dextrose, glycerol or the like as avehicle. In addition to biologically-neutral carriers, pharmaceuticalcompositions to be administered can contain minor amounts of non-toxicauxiliary substances, such as wetting or emulsifying agents,preservatives, and pH buffering agents and the like, for example sodiumacetate or sorbitan monolaurate.

Polypeptide, peptide and protein: Refer to polymers of amino acids ofany length. The polymer may be linear or branched, it may comprisemodified amino acids, and it may be interrupted by non-amino acids. Theterms also encompass an amino acid polymer that has been modified; forexample, disulfide bond formation, glycosylation, lipidation,acetylation, phosphorylation, or any other manipulation, such asconjugation with a labeling component. As used herein the term “aminoacid” includes natural and/or unnatural or synthetic amino acids,including glycine and both the D or L optical isomers, and amino acidanalogs and peptidomimetics.

Promoter: An array of nucleic acid control sequences which directtranscription of a nucleic acid. A promoter includes necessary nucleicacid sequences near the start site of transcription. A promoter alsooptionally includes distal enhancer or repressor elements. A“constitutive promoter” is a promoter that is continuously active and isnot subject to regulation by external signals or molecules. In contrast,the activity of an “inducible promoter” is regulated by an externalsignal or molecule (for example, a transcription factor).

Recombinant or host cell: A cell that has been genetically altered, oris capable of being genetically altered by introduction of an exogenouspolynucleotide, such as a recombinant plasmid or vector. Typically, ahost cell is a cell in which a vector can be propagated and its nucleicacid expressed. Such cells can be eukaryotic or prokaryotic. The termalso includes any progeny of the subject host cell. It is understoodthat all progeny may not be identical to the parental cell since theremay be mutations that occur during replication. However, such progenyare included when the term “host cell” is used.

Regulatory element: A phrase that includes promoters, enhancers,internal ribosomal entry sites (IRES), and other expression controlelements (e.g., transcription termination signals, such aspolyadenylation signals and poly-U sequences). Such regulatory elementsare described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990)which is hereby incorporated by reference in its entirety. Regulatoryelements include those that direct constitutive expression of anucleotide sequence in many types of host cells and those that directexpression of the nucleotide sequence only in certain host cells (e.g.,tissue-specific regulatory sequences). A tissue-specific promoter maydirect expression primarily in a desired tissue of interest, such asmuscle, neuron, bone, skin, blood, specific organs (e.g., liver,pancreas), or particular cell types (e.g., lymphocytes). Regulatoryelements may also direct expression in a temporal-dependent manner, suchas in a cell-cycle dependent or developmental stage-dependent manner,which may or may not also be tissue or cell-type specific.

In some embodiments, a vector provided herein includes a pol IIIpromoter (e.g., U6 and H1 promoters), a pol II promoter (e.g., theretroviral Rous sarcoma virus (RSV) LTR promoter (optionally with theRSV enhancer), the cytomegalovirus (CMV) promoter (optionally with theCMV enhancer), the SV40 promoter, the dihydrofolate reductase promoter,the β-actin promoter, the phosphoglycerol kinase (PGK) promoter, and theEF1α promoter), or both.

Also encompassed by the term “regulatory element” are enhancer elements,such as WPRE; CMV enhancers; the R-U5′ segment in LTR of HTLV-I; SV40enhancer; and the intron sequence between exons 2 and 3 of rabbitβ-globin.

RNA Editing: A type of genetic engineering in which a RNA molecule (orribonucleotides of the RNA) is inserted, deleted or replaced in thegenome of an organism using engineered nucleases (such as the Cas13dproteins provided herein), which create site-specific strand breaks atdesired locations in the RNA. The induced breaks are repaired resultingin targeted mutations or repairs. The CRISPR/Cas methods disclosedherein, such as those that use an Cas13d, can be used to edit thesequence of one or more target RNAs, such as one associated with cancer(e.g., breast cancer, colon cancer, melanoma), infectious disease (suchas HIV, hepatitis, HPV, and West Nile virus), or neurodegenerativedisorder (e.g., Huntington's disease or ALS). For example, RNA editingcan be used to treat a disease or viral infection.

RNA insertion site: A site of the RNA that is targeted for, or hasundergone, insertion of an exogenous polynucleotide. The disclosedmethods include use of a disclosed Cas13d protein, which can be used totarget a RNA for manipulation at a RNA insertion site.

Sequence identity/similarity: The similarity between amino acid (ornucleotide) sequences is expressed in terms of the similarity betweenthe sequences, otherwise referred to as sequence identity. Sequenceidentity is frequently measured in terms of percentage identity (orsimilarity or homology); the higher the percentage, the more similar thetwo sequences are.

Methods of alignment of sequences for comparison are well known in theart. Various programs and alignment algorithms are described in: Smithand Waterman, Adv. Appl. Math. 2:482, 1981; Needleman and Wunsch, J.Mol. Biol. 48:443, 1970; Pearson and Lipman, Proc. Natl. Acad. Sci.U.S.A. 85:2444, 1988; Higgins and Sharp, Gene 73:237, 1988; Higgins andSharp, CABIOS 5:151, 1989; Corpet et al., Nucleic Acids Research16:10881, 1988; and Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A.85:2444, 1988. Altschul et al., Nature Genet. 6:119, 1994, presents adetailed consideration of sequence alignment methods and homologycalculations.

The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., J.Mol. Biol. 215:403, 1990) is available from several sources, includingthe National Center for Biotechnology Information (NCBI, Bethesda, MD)and on the internet, for use in connection with the sequence analysisprograms blastp, blastn, blastx, tblastn and tblastx. A description ofhow to determine sequence identity using this program is available onthe NCBI website on the internet.

Variants of protein and nucleic acid sequences known in the art anddisclosed herein are typically characterized by possession of at leastabout 80%, at least 90%, at least 95%, at least 96%, at least 97%, atleast 98% or at least 99% sequence identity counted over the full lengthalignment with the amino acid sequence using the NCBI Blast 2.0, gappedblastp set to default parameters. For comparisons of amino acidsequences of greater than about 30 amino acids, the Blast 2 sequencesfunction is employed using the default BLOSUM62 matrix set to defaultparameters, (gap existence cost of 11, and a per residue gap cost of 1).When aligning short peptides (fewer than around 30 amino acids), thealignment should be performed using the Blast 2 sequences function,employing the PAM30 matrix set to default parameters (open gap 9,extension gap 1 penalties). Proteins with even greater similarity to thereference sequences will show increasing percentage identities whenassessed by this method, such as at least 95%, at least 98%, or at least99% sequence identity. When less than the entire sequence is beingcompared for sequence identity, homologs and variants will typicallypossess at least 80% sequence identity over short windows of 10-20 aminoacids, and may possess sequence identities of at least 85% or at least90% or at least 95% depending on their similarity to the referencesequence. Methods for determining sequence identity over such shortwindows are available at the NCBI website on the internet. One of skillin the art will appreciate that these sequence identity ranges areprovided for guidance only; it is entirely possible that stronglysignificant homologs could be obtained that fall outside of the rangesprovided.

Thus, in one example, an Cas13d protein has at least 80%, at least 85%,at least 90%, at least 91%, at least 92%, at least 95%, at least 96%, atleast 97%, at least 98% or at least 99% sequence identity to SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 138, 147, 19, 153, 155, 158, 160, 162, 164,166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200,202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253.

Subject: A vertebrate, such as a mammal, for example a human. Mammalsinclude, but are not limited to, murines, simians, humans, farm animals,sport animals, and pets. In one embodiment, the subject is a non-humanmammalian subject, such as a monkey or other non-human primate, mouse,rat, rabbit, pig, goat, sheep, dog, cat, horse, or cow. In someexamples, the subject has a disorder (e.g., viral infection) or geneticdisease that can be treated using methods provided herein. In someexamples, the subject has a disorder (e.g., viral infection) or geneticdisease that can be diagnosed using methods provided herein. In someexamples, the subject is a laboratory animal/organism, such as azebrafish, Xenopus, C. elegans, Drosophila, mouse, rabbit, or rat.Tissues, cells and their progeny of a biological entity obtained in vivoor cultured in vitro are also encompassed.

Therapeutic agent: Refers to one or more molecules or compounds thatconfer some beneficial effect upon administration to a subject. Thebeneficial therapeutic effect can include enablement of diagnosticdeterminations; amelioration of a disease, symptom, disorder, orpathological condition; reducing or preventing the onset of a disease,symptom, disorder or condition; and generally counteracting a disease,symptom, disorder or pathological condition.

Transduced, Transformed and Transfected: A virus or vector “transduces”a cell when it transfers nucleic acid molecules into a cell. A cell is“transformed” or “transfected” by a nucleic acid transduced into thecell when the nucleic acid becomes stably replicated by the cell, eitherby incorporation of the nucleic acid into the cellular genome, or byepisomal replication.

These terms encompasse all techniques by which a nucleic acid moleculecan be introduced into such a cell, including transfection with viralvectors, transformation with plasmid vectors, and introduction of nakedDNA by electroporation, lipofection, particle gun acceleration and othermethods in the art. In some example the method is a chemical method(e.g., calcium-phosphate transfection), physical method (e.g.,electroporation, microinjection, particle bombardment), fusion (e.g.,liposomes), receptor-mediated endocytosis (e.g., DNA-protein complexes,viral envelope/capsid-DNA complexes) and biological infection by virusessuch as recombinant viruses (Wolff, J. A., ed, Gene Therapeutics,Birkhauser, Boston, USA, 1994). Methods for the introduction of nucleicacid molecules into cells are known (e.g., see U.S. Pat. No. 6,110,743).These methods can be used to transduce a cell with the disclosed agentsto manipulate its genome.

Transgene: An exogenous gene.

Treating, Treatment, and Therapy: Any success or indicia of success inthe attenuation or amelioration of an injury, pathology or condition,including any objective or subjective parameter such as abatement,remission, diminishing of symptoms or making the condition moretolerable to the patient, slowing in the rate of degeneration ordecline, making the final point of degeneration less debilitating,improving a subject's physical or mental well-being, or prolonging thelength of survival. The treatment may be assessed by objective orsubjective parameters; including the results of a physical examination,blood and other clinical tests, and the like. For prophylactic benefit,the disclosed compositions may be administered to a subject at risk ofdeveloping a particular disease, condition, or symptom, or to a subjectreporting one or more of the physiological symptoms of a disease, eventhough the disease, condition, or symptom may not have yet beenmanifested.

Upregulated: When used in reference to the expression of a molecule,such as a target RNA, refers to any process which results in an increasein production of the target RNA. In one example, includes directupregulation, for example if the target RNA participates in a feedbackloop with its own transcription. In one example, includes indirectupregulation, such as by knockdown of an inhibitory miRNA that leads tothe activation of a target of that miRNA.

In some examples, the target RNA is a coding RNA. In some examples, thetarget RNA is non-coding RNA. Specific examples of RNA molecules thatcan be targeted for upregulation include mRNA, miRNA, rRNA, tRNA,nuclear RNA, and structural RNA. In some examples, upregulation oractivation of a target RNA includes processes that increase translationof the target RNA and thus can increase the presence of correspondingproteins. The disclosed CRISPR/Cas system can be used to upregulate anytarget RNA of interest.

Upregulation includes any detectable increase in target RNA. In certainexamples, detectable target RNA expression in a cell or cell free system(such as a cell expressing an Cas13d protein and gRNA disclosed herein)increases by at least 20%, at least 30%, at least 40%, at least 50%, atleast 60%, at least 70%, at least 75%, at least 80%, at least 90%, atleast 95%, at least 100%, at least 200%, at least 400%, or at least 500%as compared to a control (such an amount of target RNA detected in acorresponding normal cell or sample). In one example, a control is arelative amount of expression in a normal cell (e.g., a non-recombinantcell that does not include Cas13d or guide RNA).

Under conditions sufficient for: A phrase that is used to describe anyenvironment that permits a desired activity. In one example the desiredactivity is expression of an Cas13d protein disclosed herein, incombination with other necessary elements, for example to modulate atarget RNA.

Vector: A nucleic acid molecule into which a foreign nucleic acidmolecule can be introduced without disrupting the ability of the vectorto replicate and/or integrate in a host cell. Vectors include, but arenot limited to, nucleic acid molecules that are single-stranded,double-stranded, or partially double-stranded; nucleic acid moleculesthat comprise one or more free ends, no free ends (e.g., circular);nucleic acid molecules that comprise DNA, RNA, or both; and othervarieties of polynucleotides known in the art.

A vector can include nucleic acid sequences that permit it to replicatein a host cell, such as an origin of replication. A vector can alsoinclude one or more selectable marker genes and other genetic elementsknown in the art. An integrating vector is capable of integrating itselfinto a host nucleic acid. An expression vector is a vector that containsthe necessary regulatory sequences to allow transcription andtranslation of inserted gene or genes.

One type of vector is a “plasmid,” which refers to a circular doublestranded DNA loop into which additional DNA segments can be inserted,such as by standard molecular cloning techniques. Another type of vectoris a viral vector, wherein virally-derived DNA or RNA sequences arepresent in the vector for packaging into a virus (e.g., retroviruses,replication defective retroviruses, adenoviruses, replication defectiveadenoviruses, and adeno-associated viruses). Viral vectors also includepolynucleotides carried by a virus for transfection into a host cell. Insome embodiments, the vector is a lentivirus (such as anintegration-deficient lentiviral vector) or adeno-associated viral (AAV)vector.

Certain vectors are capable of autonomous replication in a host cellinto which they are introduced (e.g., bacterial vectors having abacterial origin of replication and episomal mammalian vectors). Othervectors (e.g., non-episomal mammalian vectors) are integrated into thegenome of a host cell upon introduction into the host cell, and therebyare replicated along with the host genome.

Certain vectors are capable of directing the expression of genes towhich they are operatively-linked. Such vectors are referred to hereinas “expression vectors.” Common expression vectors are often in the formof plasmids. Recombinant expression vectors can comprise a nucleic acidprovided herein (such as a guide RNA [which can be expressed from an RNAsequence or a RNA sequence], nucleic acid encoding an Cas13d protein) ina form suitable for expression of the nucleic acid in a host cell, whichmeans that the recombinant expression vectors include one or moreregulatory elements, which may be selected on the basis of the hostcells to be used for expression, that is operatively-linked to thenucleic acid sequence to be expressed. Within a recombinant expressionvector, “operably linked” is intended to mean that the nucleotidesequence of interest is linked to the regulatory element(s) in a mannerthat allows for expression of the nucleotide sequence (e.g., in an invitro transcription/translation system or in a host cell when the vectoris introduced into the host cell). It will be appreciated by thoseskilled in the art that the design of the expression vector can dependon such factors as the choice of the host cell to be transformed, thelevel of expression desired, etc. A vector can be introduced into hostcells to thereby produce transcripts, proteins, or peptides, includingfusion proteins or peptides, encoded by nucleic acids as describedherein (e.g., clustered regularly interspersed short palindromic repeats(CRISPR) transcripts, proteins, enzymes, mutant forms thereof, fusionproteins thereof, etc.).

II. Overview of Several Embodiments

Class 2 CRISPR-Cas systems endow microbes with diverse mechanisms foradaptive immunity. Provided herein is an analysis of prokaryotic genomeand metagenome sequences to identify an uncharacterized family ofRNA-guided, RNA-targeting CRISPR systems which is classified as TypeVI-D. Biochemical characterization and protein engineering of sevendistinct orthologs generated a ribonuclease effector derived fromRuminococcus flavefaciens XPD3002 (CasRx) with robust activity in humancells. CasRx-mediated knockdown exhibits high efficiency and specificityrelative to RNA interference across diverse endogenous transcripts. Asone of the most compact single effector Cas enzymes, CasRx can also beflexibly packaged into adeno-associated virus. Virally encoded,catalytically inactive CasRx can be targeted to cis-elements of pre-mRNAto manipulate alternative splicing, alleviating dysregulated tau isoformratios in a neuronal model of frontotemporal dementia. The resultsherein present CasRx as a programmable RNA-binding module for efficienttargeting of cellular RNA, enabling a general platform for transcriptomeengineering and therapeutic methods.

Class 2 CRISPR systems are found throughout diverse bacterial andarchaeal life. Using a minimal definition of the CRISPR locus forbioinformatic mining of prokaryotic genome and metagenome sequences,which requires only a CRISPR repeat array and a nearby protein, providedherein is the identification of an uncharacterized, remarkably compactfamily of RNA-targeting class 2 CRISPR systems designated as Type VICRISPR-Cas13d.

Because CRISPR systems generally exist as a functional operon within 20kilobases of genome sequence, even fragmented metagenome reads may besufficient to recover useful Cas enzymes for bioengineering purposes.CRISPR genome mining strategies described herein and by others (Shmakovet al., 2015), combined with ongoing efforts to profile microbialpopulations via next-generation sequencing, should contributemechanistically diverse additions to the genome engineering toolbox.

Two distinct ribonuclease properties of the Cas13d effector, whichprocesses a CRISPR repeat array into mature guides via a HEPNdomain-independent mechanism followed by guide sequence-dependentrecognition of a complementary activator RNA, were biochemicallycharacterized. This triggers HEPN-mediated RNase activity, enablingCas13d to cleave both activator and bystander RNAs, a property shared byother RNA-targeting CRISPR systems. Cas13d additionally exhibits noapparent flanking sequence requirements and was found to be activeacross crRNAs tiling a target RNA, suggesting the ability to targetarbitrary single-stranded RNA sequences.

A comprehensive activity reporter screen in human cells of Cas13dorthologs sampled from distinct branches of the Cas13d family revealedthat NLS fusions to Cas13d from Ruminococcus flavefaciens strain XPD3002(CasRx) can be engineered for programmable RNA targeting in a eukaryoticcontext (FIG. 8D). CasRx fusions knocked down a diverse set of 14endogenous mRNAs and lncRNAs, consistently achieving >90% knockdown withfavorable efficiency relative to RNA interference, dCas9-mediated CRISPRinterference, and other members of the Cas13 superfamily (FIGS.10A-10C). Additionally, CasRx interference is markedly more specificthan spacer-matching shRNAs, with no detectable off-target changescompared with hundreds for RNA interference.

CasRx is a minimal two-component platform, including an engineeredCRISPR-Cas13d effector and an associated guide RNA, and can be fullygenetically encoded. Because CasRx is an orthogonally delivered protein,HEPN-inactive dCasRx can be engineered as a flexible RNA-binding moduleto target specific RNA elements. Importantly, because CasRx uses adistinct ribonuclease activity to process guide RNAs, dCasRx can stillbe paired with a repeat array for multiplexing applications. The utilityof this concept is shown herein by creating a dCasRx splice effectorfusion for tuning alternative splicing and resulting protein isoformratios, applying it in a neuronal model of frontotemporal dementia.

At an average size of 930 aa, Cas13d is the smallest class 2 CRISPReffector characterized in mammalian cells. This allows CasRx effectordomain fusions to be paired with a CRISPR array encoding multiple guideRNAs while remaining under the packaging size limit of the versatileadeno-associated virus (AAV) delivery vehicle (Naldini, 2015) forprimary cell and in vivo delivery. Further, targeted AAV delivery ofCasRx to specific postmitotic cell types such as neurons can mediatelong-term expression of a corrective payload that avoids permanentgenetic modifications or frequent re-administration (Chiriboga et al.,2016), complementing other nucleic acid targeting technologies such asDNA nuclease editing or antisense oligonucleotides. RNA mis-splicingdiseases have been estimated to account for up to 15% of geneticdiseases (Hammond and Wood, 2011), highlighting the potential forengineered splice effectors capable of multiplexed targeting. Thematerials provided herein can be used for RNA targeting for knockdownand splicing, such as live cell labeling and genetic screens totranscript imaging, trafficking, or regulation. CRISPR-Cas13d andengineered variants such as CasRx collectively enable flexible nucleicacid engineering, transcriptome-related study, and therapeutics,expanding the genome editing toolbox beyond DNA to RNA.

Provided herein are methods of targeting (e.g., modifying, detecting)one or more target RNA molecules, such as a clustered regularlyinterspaced short palindromic repeats (CRISPR)-CRISPR associated (Cas)system-mediated RNA editing method. Such methods can include contactingone or more target RNA molecules with a non-naturally occurring orengineered (e.g., does not naturally occur in the cell or system intowhich it is introduced) CRISPR-Cas system. Thus in some examples, thedisclosed CRISPR-Cas system includes a naturally occurring Cas13dprotein (or coding sequence) and a naturally occurring gRNA, but is usedin a system or cell in which the Cas1 protein (or coding sequence) andthe gRNA are not naturally found. Furthermore, the spacer sequencewithin the gRNA molecule is not naturally occurring, and has beenmodified to be complementary to the target RNA molecule.

In some examples, a target RNA is a coding RNA. In some examples, theRNA is non-coding RNA.

The disclosed CRISPR-Cas system can include (1) at least one Cas13dprotein or at least one Cas13d nucleic acid coding sequence (such as amRNA or a vector encoding the at least one Cas13d protein) and (2) atleast one CRISPR-Cas system guide nucleic acid molecule (e.g., gRNA) (orat least one nucleic acid molecule encoding the gRNA) having sufficientcomplementary to a target RNA such that it can hybridize to a target RNAmolecule. The Cas13d protein forms a complex with the gRNA, and the gRNAdirects the complex to the one or more target RNA molecules. Thistargeting can allow the Cas13d-gRNA complex to modify or detect the oneor more target RNA molecules. In some examples, the one or more targetRNA molecules (or a cell containing the one or more target RNAmolecules) are contacted with a complex comprising the at least oneCas13d protein and the at least one gRNA. In some examples, the systemincludes Mg²⁺. However, in some examples, the system does not requireMg²⁺, such as if cleavage of the target RNA is not desired.

In some examples, contacting the one or more target RNA molecules withthe non-naturally occurring or engineered CRISPR-Cas system includesintroducing into a cell (such as a eukaryotic or prokaryotic cell)containing the one or more target RNA molecules the non-naturallyoccurring or engineered CRISPR-Cas system, for example using endocytosis(e.g., receptor-mediated endocytosis, micropinocytosis), a liposome, aparticle, an exosome, a microvesicle, a gene gun, electroporation, avirus, RNP-antibody fusion (e.g., by tethering an Cas13d RNP to anantibody, antibody fragment, or other targeting moiety [such as ScFv,aptamers, DARPins, nanobodies, affibodies, etc.], the RNP can beendocytosed into the cell, The RNP could conceivably be tethered to manythings other than), or combinations thereof. Thus, cells can betransformed, transduced, transfected, or otherwise contacted withappropriate nucleic acid molecules of the disclosed CRISPR-Cas system.The resulting cells are recombinant cells. In some examples, contactingthe one or more target RNA molecules with the non-naturally occurring orengineered CRISPR-Cas system includes contacting a cell-free system(such as a biological or environmental sample, or a cell lysate)containing the one or more target RNA molecules the non-naturallyoccurring or engineered CRISPR-Cas system (for example in a diagnosticmethod to detect a target RNA).

In some examples, at least 2, at least 3, at least 4, at least 5, atleast 10, or at least 20 different gRNAs are used. For example, such amethod could include targeting at least 2, at least 3, at least 4, atleast 5, at least 10, or at least 20 different target RNA molecules,targeting at least 2, at least 3, at least 4, at least 5, at least 10,or at least 20 different regions of one or more RNA molecules, orcombinations thereof.

Also provided are isolated nucleic acid molecules encoding such Cas13dproteins, such as a cDNA, genomic DNA, RNA, or mDNA. Such isolatednucleic acid molecules can be part of a vector (such as a plasmid orviral vector), and can be operably linked to a promoter. In someexamples, an isolated nucleic acid molecule encoding a Cas13d proteinhas at least 80%, at least 85%, at least 90%, at least 92%, at least95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%sequence identity to SEQ ID NO: 124, 125, 126, 127, 128, 139, 140 or141; or at least 80%, at least 85%, at least 90%, at least 92%, at least95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%sequence identity to SEQ ID NO: 114, 115, 116, 117, 118, 119, 120, 121,122, 123, 142, 143, 144, or 145. In an additional example, an isolatednucleic acid encodes a Cas13d protein having at least 85%, at least 90%,at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, atleast 99%, or 100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,113, 138, 147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175,177, 179, 181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208,210, 212, 214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237,239, 241, 243, 245, 247, 249, 251, or 253.

In some examples, an isolated nucleic acid molecule encoding at leastone Cas13d protein (which can be part of a vector) includes at least oneCas13d protein coding sequence that is codon optimized for expression ina eukaryotic cell, at least one Cas13d protein coding sequence codonoptimized for expression in a human cell. In one example, such an Cas13dcoding sequence has at least 80%, at least 85%, at least 90%, at least92%, at least 95%, at least 96%, at least 97%, at least 98%, at least99%, or 100% sequence identity to SEQ ID NO: 114, 115, 116, 117, 118,119, 120, 121, 122, 123, 142, 143, 144, or 145, or has at least 80%, atleast 85%, at least 90%, at least 92%, at least 95%, at least 96%, atleast 97%, at least 98%, at least 99%, or 100% sequence identity to SEQID NO: 124, 125, 126, 127, 128, 139, 140 or 141. In an additionalexample, a eukaryotic cell codon optimized nucleic acid sequence encodesa Cas13d protein having at least 85%, at least 90%, at least 92%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253.

In some examples, the gRNA that hybridizes with the one or more targetRNA molecules includes one or more direct repeat (DR) sequences, one ormore spacer sequences, or one or more sequences comprisingDR-spacer-DR-spacer. In some examples, the one or more DR sequences haveat least 80%, at least 85%, at least 90%, at least 92%, at least 95%, atleast 96%, at least 97%, at least 98%, at least 99%, or 100% sequenceidentity to SEQ ID NO: 129, 130, 131, 132, 133, 134, 135, 136, 137, 148,150, 151, 152, 154, 156, 157, 159, 161, 163, 165, 167, 169, 176, 178,180, 182, 184, 186, 188, 190, 191, 192, 193, 199, 201, 203, 205, 207,209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 228, 230, 232, 234,236, 238, 240, 242, 244, 246, 248, 250, 252, or 254. In one example, thegRNA includes additional sequences, such as an aptamer sequence.

In some examples, a plurality of gRNAs are processed from a single arraytranscript, wherein each gRNA can be different, for example to targetdifferent RNAs or target multiple regions of a single RNA.

In some examples, the DRs are truncated by 1-10 nucleotides (such as 1,2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides) at the 5′end, for example tobe expressed as mature pre-processed guide RNAs.

Methods of targeting one or more target RNA molecules are provided.Targeting an RNA molecule can include one or more of cutting or nickingone or more target RNA molecules, activating or upregulating one or moretarget RNA molecules, activating or suppressing translation the one ormore target RNA molecules, deactivating the one or more target RNAmolecules, visualizing, labeling, or detecting the one or more targetRNA molecules, binding the one or more target RNA molecules, editing theone or more target RNA molecules, trafficking the one or more target RNAmolecules, and masking the one or more target RNA molecules. In someexample, modifying one or more target RNA molecules includes one or moreof an RNA base substitution, an RNA base deletion, an RNA baseinsertion, a break in the target RNA, methylating RNA, and demethylatingRNA.

In some examples, such methods are used to treat a disease, such as adisease in a human. In such examples, the one or more target RNAmolecules is associated with the disease

Also provided are isolated proteins, including non-naturally occurringproteins. in some examples, a protein has at least 80%, at least 85%, atleast 90%, at least 92%, at least 95%, at least 96%, at least 97%, atleast 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164, 166,168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200, 202,204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229, 231,233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253. In someexamples, an isolated protein is an Cas13d ortholog from a prokaryoticgenome or metagenome, gut metagenome, an activated sludge metagenome, ananaerobic digester metagenome, a chicken gut metagenome, a human gutmetagenome, a pig gut metagenome, a bovine gut metagenome, a sheep gutmetagenome, a goat gut metagenome, a capybara gut metagenome, a primategut metagenome, a termite gut metagenome, a fecal metagenome, a genomefrom the Order Clostridiales, or the Family Ruminococcaceae. In someexamples, an Cas13d ortholog includes an Cas13d ortholog fromRuminococcus albus, Eubacterium siraeum, a Ruminococcus flavefaciensstrain XPD3002, Ruminococcus flavefaciens FD-1, uncultured Eubacteriumsp TS28-c4095, uncultured Ruminococcus sp., Ruminococcus bicirculans, orRuminococcus sp CAG57. Such proteins can include a subcellularlocalization signal. In some examples, such proteins include a mutationin at least one native HEPN domain.

Also provided are isolated guide RNA (gRNA) molecules. In some examples,an isolated gRNA includes one or more direct repeat (DR) sequences, suchas one having at least 80%, at least 85%, at least 90%, at least 92%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 129, 130, 131, 132, 133, 134, 135,136, 137, 148, 150, 151, 152, 154, 156, 157, 159, 161, 163, 165, 167,169, 176, 178, 180, 182, 184, 186, 188, 190, 191, 192, 193, 199, 201,203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 228,230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, or 254. Sucha gRNA can further include one or more spacer sequences specific for(e.g., is complementary to) the target RNA. Such guide RNAs can furtherbe optionally truncated by 1-nucleotides (such as 1, 2, 3, 4, 5, 6, 7,8, 9, or 10 nucleotides) at the 5′end of the DR, for example to generatepre-processed guide RNAs.

Also provided are ribonucleoprotein (RNP) complexes, which include oneor more Cas13d proteins provided herein and one or more gRNAs providedherein.

Also provided are recombinant cells that include any Cas13d protein (ornucleic acid molecule encoding Cas13d), any gRNA, any RNP complex, orany vector, provided herein. In one example, the cell is not a bacterialcell. In one example, the cell is a bacterial cell.

Also provided are compositions that include one or more of any Cas13dprotein (or nucleic acid molecule encoding Cas13d), any gRNA or array,any RNP complex, any isolated nucleic acid molecule, any vector, or anycell, provided herein. Such compositions can include a pharmaceuticallyacceptable carrier.

Also provided are kits. Such kits can include one or more of any Cas13dprotein (or nucleic acid molecule encoding Cas13d), any gRNA or array,any RNP complex, any isolated nucleic acid molecule, any vector, anycell, or any composition provided herein. Such reagents can be combinedor in separate containers.

In some examples, a Cas13d protein is programmed toward its RNA targetby combining the protein (or nucleic acid encoding the protein) with anengineered RNA guide (or nucleic acid encoding RNA guide) consisting ofa full or partial direct repeat sequence followed by a “spacer” sequencecomplementary to the RNA target(s) (or variations thereof, i.e. arrays(DR-spacer-DR-spacer-DR-spacer . . . etc.) or pre-guide RNAs(DR-spacer-DR). Cas13d Proteins can be catalytically inactivated andtransformed into RNA binding modules by mutating the conserved RNAseHEPN motif (RXXXXH). Exemplary Cas13d proteins and corresponding guidesare provided herein (e.g., SEQ ID NOS: 147-170, 175-193 and SEQ ID NOS:198-254).

A. Cas13d Proteins

Provided herein are novel Cas13d proteins, such as those as shown in inthe sequence listing. SEQ ID NOS: 1, 3, 42, 62, 70, 82, 83, 92, 147,149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181,183, 185, 187, 189, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216,218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245,247, 249, 251, and 253 provide distinct full length proteins, and SEQ IDNOS: 2, 4-41, 43-61, 63-69, 71-81, 84-91, and 93-113 and 194 provideCas13d variants and fragments. Such proteins can be used in thedisclosed methods, compositions, and kits.

In some examples, an Cas13d protein includes one or more (such as 1 or2) native HEPN domains. In some examples, an Cas13d protein includes oneor more mutated HEPN domains (such as mutant Cas13d protein can processthe gRNA, but cannot modify the target RNA). In some examples, an Cas13dprotein is no more than 150 kD, no more than 140 kD, no more than 130kD, no more than 120 kD, such as about 90 to 120 kD, about 100 to 120 kDor about 110 kD.

In addition to the Cas13d proteins provided in Table 1 and in Example 2,the disclosure encompasses Cas13d orthologs from a prokaryotic genome ormetagenome, gut metagenome, an activated sludge metagenome, an anaerobicdigester metagenome, a chicken gut metagenome, a human gut metagenome, apig gut metagenome, a bovine gut metagenome, a sheep gut metagenome, agoat gut metagenome, a capybara gut metagenome, a primate gutmetagenome, a termite gut metagenome, a fecal metagenome, a genome fromthe Order Clostridiales, or the Family Ruminococcaceae, such as anCas13d ortholog from Ruminococcus albus, Eubacterium siraeum, aRuminococcus flavefaciens strain XPD3002, Ruminococcus flavefaciensFD-1, uncultured Eubacterium sp TS28-c4095, uncultured Ruminococcus sp.,Ruminococcus bicirculans, or Ruminococcus sp CAG57.

In some examples, an Cas13d protein is at least 800 aa, at least 900 aa,or at least 1000 aa, such as 800 to 1200 aa, 850 to 1050 aa, or 860-1040aa.

1. Variant Cas13d Sequences

Cas13d proteins, including variants of the sequences provided herein(such as variants of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101,102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138, 147,149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181,183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212, 214,216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243,245, 247, 249, 251, or 253) are encompassed within this disclosure. Insome examples, Cas13d proteins provided herein can contain one or moremutations, such as a single insertion, a single deletion, a singlesubstitution, or combinations thereof. In some examples, the Cas13dprotein includes at least 1, at least 5, at least 10, at least 15, atleast 20, at least 25, at least 30, at least 40, at least 50, at least75, at least 100, at least 200 or at least 300 aa insertions, such as1-insertions (for example at the N- or C-terminus or within the protein,such as insertion of a whole small domain), at least 1, at least 5, atleast 10, at least 15, at least 20, at least 25, at least 30, at least40, at least 50, at least 75, at least 100, at least 200 or at least 300aa deletions (such as deletion of a whole small domain), such as 1-20deletions (for example at the N- or C-terminus or within the protein),at least 1, at least 5, at least 10, at least 15, at least 20, at least25, at least 30 aa substitutions, such as 1-20 substitutions, or anycombination thereof (e.g., single insertion together with 1-19substitutions), but retain the ability to bind target RNA moleculescomplementary to the spacer sequence within the gRNA molecule and/orprocess an guide array RNA transcript into gRNA molecules and/or retainthe ability to cleave target RNA. In some examples, the disclosureprovides a variant of any disclosed Cas13d protein (such as SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164,166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200,202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253) having 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26, 27, 28, 29 or 30 amino acid changes, but retain theability to bind target RNA molecules complementary to the spacersequence within the gRNA molecule and/or process an guide array RNAtranscript into gRNA molecules. In some examples, any disclosed Cas13dprotein (such as SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138, 147, 149,153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181, 183,185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216,218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245,247, 249, 251, or 253) further includes 1-8 amino acid insertions, 1-15amino acid deletions, 1-10 amino acid substitutions, or any combinationthereof (e.g., 1-15, 1-4, or 1-5 amino acid deletions together with1-10, 1-5 or 1-7 amino acid substitutions), with the retained ability tobind target RNA molecules complementary to the spacer sequence withinthe gRNA molecule and/or process an guide array RNA transcript into gRNAmolecules. In one example, such variant peptides are produced bymanipulating the nucleotide sequence encoding a peptide using standardprocedures such as site-directed mutagenesis or PCR. Such variants canalso be chemically synthesized.

In some examples, an Cas13d protein includes a motif shown in SEQ ID NO:195, Motif 2, or Motif 3. Thus, an Cas13d protein having at least 80%,at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, atleast 97%, at least 98%, at least 99%, or 100% sequence identity to SEQID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,107, 108, 109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160,162, 164, 166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194,198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224,226, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253,in some examples includes at least one motif shown in SEQ ID NO: 195,Motif 2, or Motif 3.

One type of modification or mutation includes the substitution of aminoacids for amino acid residues having a similar biochemical property,that is, a conservative substitution (such as 1-4, 1-8, 1-10, or 1-20conservative substitutions). Typically, conservative substitutions havelittle to no impact on the activity of a resulting peptide. For example,a conservative substitution is an amino acid substitution in an Cas13dprotein (such as SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138, 147, 149,153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181, 183,185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216,218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245,247, 249, 251, or 253) that does not substantially affect the ability ofthe Cas13d protein to bind target RNA molecules complementary to thespacer sequence within the gRNA molecule and/or process an guide arrayRNA transcript into gRNA molecules. An alanine scan can be used toidentify which amino acid residues in an Cas13d protein (such as SEQ IDNO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,108, 109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162,164, 166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198,200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226,229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253), cantolerate an amino acid substitution. In one example, the ability of avariant Cas13d protein (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101,102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138, 147,149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179, 181,183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212, 214,216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241, 243,245, 247, 249, 251, or 253) to modify gene expression in a CRISPR/Cassystem, is not altered by more than 25%, for example not more than 20%,for example not more than 10%, when an alanine, or other conservativeamino acid, is substituted for 1-4, 1-8, 1-10, or 1-20 native aminoacids. Examples of amino acids which may be substituted for an originalamino acid in a protein and which are regarded as conservativesubstitutions include: Ser for Ala; Lys for Arg; Gln or His for Asn; Glufor Asp; Ser for Cys; Asn for Gln; Asp for Glu; Pro for Gly; Asn or Glnfor His; Leu or Val for Ile; Ile or Val for Leu; Arg or Gln for Lys; Leuor Ile for Met; Met, Leu or Tyr for Phe; Thr for Ser; Ser for Thr; Tyrfor Trp; Trp or Phe for Tyr; and Ile or Leu for Val.

One method for identifying regions particularly amenable to insertions,substitutions or deletion is to target stretches of amino acidsexhibiting low levels of conservation between orthologs. Such regionsare indicated in the conservation graph of the alignment of Cas13dproteins provided in FIG. 1B. Conserved residues of Cas13d are furthermarked in the Cas13d protein alignment provided in FIG. 18A-18MMM(indicated by symbols “.” “:” or “*” below aligned conserved residues).Examples of deletions and their functional testing are further providedin FIGS. 16A-16B.

Another type of substitution can be achieved by swapping out parts ofone ortholog with the homologous region of another ortholog to obtain acombined “chimeric” protein. Such a chimeric protein may combinefavorable properties of multiple Cas13d orthologs.

More substantial changes can be made by using substitutions that areless conservative, e.g., selecting residues that differ moresignificantly in their effect on maintaining: (a) the structure of thepolypeptide backbone in the area of the substitution, for example, as asheet or helical conformation; (b) the charge or hydrophobicity of thepolypeptide at the target site; or (c) the bulk of the side chain. Thesubstitutions that in general are expected to produce the greatestchanges in polypeptide function are those in which: (a) a hydrophilicresidue, e.g., serine or threonine, is substituted for (or by) ahydrophobic residue, e.g., leucine, isoleucine, phenylalanine, valine oralanine; (b) a cysteine or proline is substituted for (or by) any otherresidue; (c) a residue having an electropositive side chain, e.g.,lysine, arginine, or histidine, is substituted for (or by) anelectronegative residue, e.g., glutamic acid or aspartic acid; or (d) aresidue having a bulky side chain, e.g., phenylalanine, is substitutedfor (or by) one not having a side chain, e.g., glycine.

Thus, the disclosure provides Cas13d proteins having at least 80%, atleast 85%, at least 90%, at least 92%, at least 95%, at least 96%, atleast 97%, at least 98%, at least 99%, or 100% sequence identity to SEQID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,107, 108, 109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160,162, 164, 166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194,198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224,226, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253or combinations (e.g., chimeras) thereof.

In one example, an Cas13d protein includes non-naturally occurring aminoacids.

2. Cas13d Proteins with Other Elements

An Cas13d protein (such as any of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,138, 147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177,179, 181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210,212, 214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239,241, 243, 245, 247, 249, 251, or 253) can include other elements ordomains, for example at the N- or C-terminus (or both). The resultingprotein can be referred to as an Cas13d fusion protein. In one example,an Cas13d protein provided herein (such as a native Cas13d or an Cas13dwith mutated HEPN domain(s)) includes a subcellular localization signal.Exemplary subcellular localization signals include an organellelocalization signal, such as a nuclear localization signal (NLS),nuclear export signal (NES), or mitochondrial localization signal. Inone example, an Cas13d protein includes an NLS, such as SPKKKRKVEAS (SEQID NO: 256; e.g., encoded by AGCCCCAAGAAgAAGAGaAAGGTGGAGGCCAGC, SEQ IDNO: 257) or GPKKKRKVAAA (5V40 large T antigen NLS, SEQ ID NO: 258; e.g.,encoded by ggacctaagaaaaagaggaaggtggcggccgct, SEQ ID NO: 260). ExemplaryNES that can be part of an Cas13d protein include an adenovirus type 5E1B nuclear export sequence, an HIV nuclear export sequence, a MAPKnuclear export sequence, or a PTK2 nuclear export sequence.

In some examples, the at least one Cas13d protein (such as a nativeCas13d or an Cas13d with mutated HEPN domain(s)) further includes one ormore effector domains. Exemplary effector domains include proteinsand/or enzymes, such that those can cleave RNA (e.g., a PIN endonucleasedomain, an NYN domain, an SMR domain from SOT1, or an RNase domain fromStaphylococcal nuclease), those that can affect RNA stability (e.g.,tristetraprolin (TTP) or domains from UPF1, EXOSC5, and STAU1), thosethat can edit a nucleotide or ribonucleotide (e.g., a cytidinedeaminase, PPR protein, adenosine deaminase, ADAR family protein, orAPOBEC family protein), those that can activate translation (e.g., eIF4Eand other translation initiation factors, a domain of the yeastpoly(A)-binding protein or GLD2), those that can repress translation(e.g., Pumilio or FBF PUF proteins, deadenylases, CAF1, Argonauteproteins), those that can methylate RNA (e.g., domains from m6Amethyltransferase factors such as METTL14, METTL3, or WTAP), those thatcan demethylate RNA (e.g., human alkylation repair homolog 5 or Alkbh5),those that can affect splicing (e.g., the RS-rich domain of SRSF1, theGly-rich domain of hnRNP A1, the alanine-rich motif of RBM4, or theproline-rich motif of DAZAP1), those that can enable affinitypurification or immunoprecipitation (e.g., FLAG, HA, biotin, or HALOtags), and those that can enable proximity-based protein labeling andidentification (e.g., a biotin ligase (such as BirA) or a peroxidase(such as APEX2) in order to biotinylate proteins that interact with thetarget RNA).

In some examples, the Cas13d protein and effector module combination canconstitute a transcriptional sensor. For example, the transcriptionalsensor can be comprised of at least one Cas13d protein with a mutatedHEPN domain (e.g., SEQ ID NO: 2 or 4), at least one gRNA containing atleast one spacer sequence specific for the target RNA, and an effectormodule such as an optionally split fluorescent protein or probe (e.g., asplit Venus fluorescent protein, a split GFP, a split enhanced GFP, asplit mCherry, a split super-folder mCherry, and other fluorescentprotein variants such as ECFP, YFP, RFP, and derivatives or fragmentsthereof); an optionally split luminescent protein or probe (e.g.Gaussia, Firefly, NanoLuc, or Renilla variants); an optionally splitenzyme (e.g., ubiquitin or TEV protease); a FRET-compatible proteinpair; one or more transcription factor(s) fused to Cas13d via cleavablelinkers (e.g., an artificial GAL4, zinc finger, transcriptionalactivator like effector (TALE), CRISPR-Cas9, CRISPR-Cpf1, or TetR-basedtranscription factor or an endogenous transcription factor); a splitintein that trans-splices a protein to restore its function such as atranscription factor (e.g., an intein from Rhodothermus marinus orDnaE); a kinase-substrate pair that activates upon phosphorylation(e.g., TYK2-STAT3); one, two, or more monomers that activate upondimerization or multimerization (e.g., caspase 9); or one or moreproteins that induce conformational and functional change uponinteraction. In one example, the spatial proximity of two or more Cas13dproteins and gRNAs due to binding a particular transcript would activatethe effector module, resulting in a detectable signal or detectableactivity in the cell.

In one example, the effector domain is fused to a protein thatspecifically recognizes and binds an RNA aptamer, such as one that canbe appended to or inserted within a gRNA molecule (e.g., an MS2, PP7,Qβ, and other aptamers). This aptamer-effector domain fusion can be usedto target the target RNA because the Cas13d and gRNA complex will guidethe aptamer protein-effector domain in proximity to the target RNA.

In another example, the aptamer can be directly inserted into the gRNAmolecule to permit detection of a target RNA, such as a fluorophoreaptamer (e.g., Spinach, Mango, etc.).

In some examples, the Cas13d protein (such as a native Cas13d or anCas13d with mutated HEPN domain(s)) includes a purification tag, such asan HA-tag, His-tag (such as 6-His), Myc-tag, E-tag, S-tag, calmodulintag, FLAG-tag, GST-tag, MBP-tag, and the like. Such tags are in someexamples at the N- or C-terminal end of the Cas13d protein.

In some examples, an Cas13d protein (such as a native Cas13d or anCas13d with mutated HEPN domain(s)) includes one or more subcellularlocalization signals, effector domains, and purification tags.

In some examples, an Cas13d protein may be split into multiplefragments, which are then expressed individually. Such fragments ofCas13d may be optionally fused to other protein domains. In one example,an Cas13d can be split into two halves, which are then fused to twoparts of an inducible heterodimer pair. Upon induction of heterodimerbinding, the Cas13d halves are recruited to each other to form an activeprotein. Such a system would allow for the inducible control of Cas13dactivity. Useful heterodimer pairs include two proteins that dimerizeupon light illumination or through administration of a small moleculecompound, amongst others. Specific examples of heterodimer pair includebut are not limited to: light inducible Magnets proteins, the lightinducible iLID-SspB pair, the light inducible Cryptochrome2-CIB1 dimerand the small molecule inducible FKBP protein. In another example of asplit Cas13d design, two halved of the Cas13d protein may be fused toprotein trans-splicing domains. Such a design would enable the separateexpression of two halves which are reconstituted into a full lengthprotein once expressed inside a cell. An example of such transsplicingdomains includes the Intein system.

One method for identifying regions particularly amenable to splitting ofthe protein, is to identify stretches of amino acids exhibiting lowlevels of conservation between orthologs. Such regions are indicated inthe conservation graph of the alignment of Cas13d proteins provided inFIG. 1B. Regions of conserved residues of Cas13d are further marked inthe Cas13d protein alignment provided in FIG. 18A-18MMM (indicated bysymbols . : or * below aligned conserved residues).

3. Generation of Cas13d Proteins

In one example, the Cas13d protein is expressed in vitro, for example,in a prokaryotic cell (e.g., bacteria such as Lactobacillus,Lactococcus, Bacillus (such as B. subtilis), Escherichia (such as E.coli), Salmonella typhimurium, and Clostridium), archea cell, plant orplant cell, fungal cell (e.g., Neurospora), yeast cell (e.g.,Saccharomyces or Pichia (such as S. cerevisiae or P. pastoris),Kluyveromyces lactis), insect cell (e.g., SF9 cells), or mammalian cells(e.g., 293 cells, or immortalized mammalian myeloid and lymphoid celllines). Once expressed, the Cas13d protein can be isolated and/orpurified (e.g., using chromatography or immunological separation). Insome examples, as tag on the Cas13d protein permits isolation of theprotein from a culture media. Exemplary procedures include ammoniumsulfate precipitation, affinity columns, column chromatography, and thelike (see, generally, R. Scopes, Protein Purification, Springer-Verlag,N.Y., 1982). Substantially pure compositions of at least about 90 to 95%homogeneity, such as 98% to 99% homogeneity, can be used in the methodsprovided herein. For example, a purified preparation of an Cas13dprotein can be used as an alternative to expressing the Cas13d proteinfrom a nucleic acid molecule in the CRISPR/Cas system.

In addition to recombinant methods, Cas13d proteins disclosed herein canalso be constructed in whole or in part using native chemical ligationand/or expressed protein ligation.

B. Nucleic Acid Molecules Encoding Cas13d Proteins

Nucleic acid molecules encoding an Cas13d protein are encompassed bythis disclosure. Nucleic acid molecules include DNA, genomic DNA, cDNA,mRNA, and RNA sequences which encode an Cas13d peptide. Such nucleicacid molecules can include naturally occurring or non-naturallyoccurring nucleotides or ribonucleotides. Exemplary nucleic acidmolecules that encode the novel Cas13d proteins shown in SEQ ID NOS: 1,3, 42, 62, 70, 82, 83, 92 and 104, are shown in SEQ ID NOS: 124-128,139, 140, and 141. Also provided are codon optimized nucleic acidmolecules that encode the novel Cas13d proteins, for example thoseoptimized for expression in a mammalian cells, such as a human cell (SEQID NOS: 114-123 and 142-145). For example, SEQ ID NOS: 114, 118, and 122provide nucleic acid molecules optimized for expression in human cells.SEQ ID NOS: 115, 119 and 123 provide nucleic acid molecules optimizedfor expression in human cells, and which encode for mutant HEPN sites.SEQ ID NOS: 116 and 120 provide nucleic acid molecules optimized forexpression in human cells, and which includes an N-terminal nuclearlocalization (NLS) coding sequence (namely, SPKKKRKVEAS). SEQ ID NO: 117and 121 provide nucleic acid molecules optimized for expression in humancells, and which include N-terminal and C-terminal NLS coding sequences(namely, SPKKKRKVEAS, SEQ ID NO: 256, and GPKKKRKVAAA SEQ ID NO: 258,respectively).

In one example, a nucleic acid sequence encodes an Cas13d protein havingat least 60%, at least 70%, at least 75%, at least 80%, at least 90%, atleast 92%, at least 95%, at least 96%, at least 97%, at least 99% or atleast 99% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253. Such nucleic acid sequences can begenerated based on the amino acid sequences provided herein, and thegenetic code. In one example, an Cas13d nucleic acid sequence has atleast 70%, at least 80%, at least 85%, at least 90%, at least 92%, atleast 95%, at least 98%, or at least 99% sequence identity to SEQ ID NO:124, 125, 126, 127, 128, 139, 140 or 141. In one example, an Cas13dnucleic acid sequence is optimized for expression in mammalian cells,such as human cells, such as one having at least 70%, at least 80%, atleast 85%, at least 90%, at least 92%, at least 95%, at least 98%, or atleast 99% sequence identity to SEQ ID NO: 114, 115, 116, 117, 118, 119,120, 121, 122, 123, 142, 143, 144, or 145.

One of skill can readily construct a variety of clones containingfunctionally equivalent nucleic acids, such as nucleic acids whichdiffer in sequence but which encode the same Cas13d protein sequence.Silent mutations in the coding sequence result from the degeneracy(i.e., redundancy) of the genetic code, whereby more than one codon canencode the same amino acid residue. Thus, for example, leucine can beencoded by CTT, CTC, CTA, CTG, TTA, or TTG; serine can be encoded byTCT, TCC, TCA, TCG, AGT, or AGC; asparagine can be encoded by AAT orAAC; aspartic acid can be encoded by GAT or GAC; cysteine can be encodedby TGT or TGC; alanine can be encoded by GCT, GCC, GCA, or GCG;glutamine can be encoded by CAA or CAG; tyrosine can be encoded by TATor TAC; and isoleucine can be encoded by ATT, ATC, or ATA. Tablesshowing the standard genetic code can be found in various sources (see,for example, Stryer, 1988, Biochemistry, 3^(rd) Edition, W.H. 5 Freemanand Co., NY).

Based on the genetic code, nucleic acid sequences coding for any Cas13dsequence can be generated. In some examples, such a sequence isoptimized for expression in a host or target cell, such as a host cellused to express the Cas13d protein or a cell in which the disclosedmethods are practice (such as in a mammalian cell, such as a humancell). Codon preferences and codon usage tables for a particular speciescan be used to engineer isolated nucleic acid molecules encoding anCas13d (such as one encoding a protein having at least 80%, at least85%, at least 90%, at least 92%, at least 95%, at least 96%, at least97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164,166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200,202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253 that takesadvantage of the codon usage preferences of that particular species).For example, the Cas13d proteins disclosed herein can be designed tohave codons that are preferentially used by a particular organism ofinterest. In one example, an Cas13d nucleic acid sequence is optimizedfor expression in human cells, such as one having at least 70%, at least80%, at least 85%, at least 90%, at least 92%, at least 95%, at least98%, or at least 99% sequence identity to SEQ ID NO: 114, 115, 116, 117,118, 119, 120, 121, 122, 123, 142, 143, 144, or 145.

A nucleic acid encoding an Cas13d protein (such as one encoding aprotein having at least 80%, at least 85%, at least 90%, at least 92%,at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253) can be cloned or amplified by in vitromethods, such as the polymerase chain reaction (PCR), the ligase chainreaction (LCR), the transcription-based amplification system (TAS), theself-sustained sequence replication system (3SR) and the Qβ replicaseamplification system (QB). In addition, nucleic acids encoding an Cas13dprotein (such as one encoding a protein having at least 80%, at least85%, at least 90%, at least 92%, at least 95%, at least 96%, at least97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164,166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200,202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or 253) can beprepared by cloning techniques. Examples of appropriate cloning andsequencing techniques, and instructions sufficient to direct persons ofskill through cloning are found in Sambrook et al. (ed.), MolecularCloning: A Laboratory Manual 2nd ed., vol. 1-3, Cold Spring HarborLaboratory Press, Cold Spring, Harbor, N.Y., 1989, and Ausubel et al.,(1987) in “Current Protocols in Molecular Biology,” John Wiley and Sons,New York, N.Y.

Nucleic acid sequences encoding an Cas13d protein (such as one encodinga protein having at least 80%, at least 85%, at least 90%, at least 92%,at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253) can be prepared by any suitable methodincluding, for example, cloning of appropriate sequences or by directchemical synthesis by methods such as the phosphotriester method ofNarang et al., Meth. Enzymol. 68:90-99, 1979; the phosphodiester methodof Brown et al., Meth. Enzymol. 68:109-151, 1979; thediethylphosphoramidite method of Beaucage et al., Tetra. Lett.22:1859-1862, 1981; the solid phase phosphoramidite triester methoddescribed by Beaucage & Caruthers, Tetra. Letts. 22(20):1859-1862, 1981,for example, using an automated synthesizer as described in, forexample, Needham-VanDevanter et al., Nucl. Acids Res. 12:6159-6168,1984; and, the solid support method of U.S. Pat. No. 4,458,066. Chemicalsynthesis produces a single stranded oligonucleotide. This can beconverted into double stranded DNA by hybridization with a complementarysequence, or by polymerization with a DNA polymerase using the singlestrand as a template. One of skill would recognize that while chemicalsynthesis of DNA is generally limited to sequences of about 100 bases,longer sequences may be obtained by the ligation of shorter sequences.

In one example, an Cas13d protein (such as a protein having at least80%, at least 85%, at least 90%, at least 92%, at least 95%, at least96%, at least 97%, at least 98%, at least 99%, or 100% sequence identityto SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,106, 107, 108, 109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158,160, 162, 164, 166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189,194, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222,224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or253) is prepared by inserting the cDNA which encodes the Cas13d proteininto a plasmid or vector. The insertion can be made so that the Cas13dprotein is read in frame so that the Cas13d protein is produced.

The Cas13d nucleic acid coding sequence (such as one having at least80%, at least 85%, at least 90%, at least 92%, at least 95%, at least96%, at least 97%, at least 98%, at least 99%, or 100% sequence identityto SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,106, 107, 108, 109, 110, 111, 112, 113, 138, 147, 149, 153, 155, 158,160, 162, 164, 166, 168, 170, 175, 177, 179, 181, 183, 185, 187, 189,194, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222,224, 226, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, or253) can be inserted into an expression vector including, but notlimited to a plasmid, virus or other vehicle that can be manipulated toallow insertion or incorporation of sequences and can be expressed ineither prokaryotes or eukaryotes. Hosts can include microbial, yeast,insect, plant and mammalian organisms. The vector can encode aselectable marker, such as a thymidine kinase gene or antibioticresistance gene.

Nucleic acid sequences encoding an Cas13d protein (such as one encodinga protein having at least 80%, at least 85%, at least 90%, at least 92%,at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253) can be operatively linked to expressioncontrol sequences. An expression control sequence operatively linked toan Cas13d coding sequence is ligated such that expression of the Cas13dprotein coding sequence is achieved under conditions compatible with theexpression control sequences. The expression control sequences include,but are not limited to appropriate promoters, enhancers, transcriptionterminators, a start codon (i.e., ATG) in front of an Cas13dprotein-encoding gene, splicing signal for introns, maintenance of thecorrect reading frame of that gene to permit proper translation of mRNA,and stop codons.

In one embodiment, vectors are used for expression in yeast such as S.cerevisiae, P. pastoris, or Kluyveromyces lactis. Exemplary promotersfor use in yeast expression systems include but are not limited to: theconstitutive promoters plasma membrane H⁺-ATPase (PMA1),glyceraldehyde-3-phosphate dehydrogenase (GPD), phosphoglyceratekinase-1 (PGK1), alcohol dehydrogenase-1 (ADH1), and pleiotropicdrug-resistant pump (PDR5). In addition, many inducible promoters are ofuse, such as GAL1-10 (induced by galactose), PHO5 (induced by lowextracellular inorganic phosphate), and tandem heat shock HSE elements(induced by temperature elevation to 37° C.). Promoters that directvariable expression in response to a titratable inducer include themethionine-responsive METS and MET25 promoters and copper-dependent CUP1promoters. Any of these promoters may be cloned into multicopy (2μ) orsingle copy (CEN) plasmids to give an additional level of control inexpression level. The plasmids can include nutritional markers (such asURA3, ADE3, HIS1, and others) for selection in yeast and antibioticresistance (AMP) for propagation in bacteria. Plasmids for expression onK. lactis are known, such as pKLAC1.

Viral vectors can also be prepared that encode an Cas13d (such as oneencoding a protein having at least 80%, at least 85%, at least 90%, atleast 92%, at least 95%, at least 96%, at least 97%, at least 98%, atleast 99%, or 100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,113, 138, 147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175,177, 179, 181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208,210, 212, 214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237,239, 241, 243, 245, 247, 249, 251, or 253). Exemplary viral vectorsinclude polyoma, SV40, adenovirus, vaccinia virus, adeno-associatedvirus, herpes viruses including HSV and EBV, lentivirus, Sindbisviruses, alphaviruses and retroviruses of avian, murine, and humanorigin. Baculovirus (Autographa californica multinuclear polyhedrosisvirus; AcMNPV) vectors can be used and obtained from commercial sources.Other suitable vectors include retrovirus vectors, orthopox vectors,avipox vectors, fowlpox vectors, capripox vectors, suipox vectors,adenoviral vectors, herpes virus vectors, alpha virus vectors,baculovirus vectors, Sindbis virus vectors, vaccinia virus vectors andpoliovirus vectors. Specific exemplary vectors are poxvirus vectors suchas vaccinia virus, fowlpox virus and a highly attenuated vaccinia virus(MVA), adenovirus, baculovirus and the like. Pox viruses of use includeorthopox, suipox, avipox, and capripox virus. Orthopox include vaccinia,ectromelia, and raccoon pox. One example of an orthopox of use isvaccinia. Avipox includes fowlpox, canary pox and pigeon pox. Capripoxinclude goatpox and sheeppox. In one example, the suipox is swinepox.Other viral vectors that can be used include other DNA viruses such asherpes simplex virus and adenoviruses, and RNA viruses such asretroviruses and polio.

Viral vectors that encode an Cas13d protein (such as one encoding aprotein having at least 80%, at least 85%, at least 90%, at least 92%,at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253) can include at least one expressioncontrol element operationally linked to the nucleic acid sequenceencoding the Cas13d protein. The expression control elements control andregulate the expression of the Cas13d nucleic acid sequence. Exemplaryexpression control elements that can be used include, but are notlimited to, lac system, operator and promoter regions of phage lambda,yeast promoters and promoters derived from polyoma, adenovirus,retrovirus or SV40. In one example the promoter is CMV, U6, CBh, CMW,Cbh, EF1a. In one example, the promoter is a cell type specificpromoter, such as synapsin or GFAP, or an inducible promoter, such as atetracycline inducible promoter. Additional operational elementsinclude, but are not limited to, leader sequence, termination codons,polyadenylation signals and any other sequences necessary for theappropriate transcription and subsequent translation of the nucleic acidsequence encoding the Cas13d protein in the host system. The expressionvector can contain additional elements necessary for the transfer andsubsequent replication of the expression vector containing the nucleicacid sequence in the host system. Examples of such elements include, butare not limited to, origins of replication and selectable markers.

In one example, the vector includes a polyA signal after the Cas13dprotein coding sequence, a WPRE signal for expression in viral vectors,or combinations thereof.

In one example, the method uses direct delivery of an mRNA that encodesfor an Cas13d protein.

C. Guide Nucleic Acid Molecules

The disclosure provides guide nucleic acid molecules, such as guide RNA(gRNA or crRNA, CRISPR (guide) RNA), which can be used in the methods,compositions, and kits provided herein. Such molecules can includenaturally occurring or non-naturally occurring nucleotides orribonucleotides (such as LNAs or other chemically modified nucleotidesor ribonucleotides, for example to protect a guide RNA fromdegradation). In some examples, the guide sequence is RNA. The guidenucleic acid can include modified bases or chemical modifications (e.g.,see Latorre et al., Angewandte Chemie 55:3548-50, 2016). A guidesequence directs an Cas13d protein to a target RNA, thereby targetingthe RNA (e.g., modifying or detecting the RNA).

Guide molecules include one or more regions referred to as spacers. Aspacer has sufficient complementarity with a target RNA sequence tohybridize with the target RNA and direct sequence-specific binding of anCas13d protein to the target RNA. Thus, the spacer is the variableportion of the guide sequence. In some examples, a spacer has 100%complementarity to a target RNA (or region of the RNA to be target), buta spacer can have less than 100% complementarity to a target RNA, suchas at least 80%, at least 85%, at least 90%, at least 95%, at least 98%or at least 99% complementarity to a target RNA.

A guide sequence can also include one or more direct repeats (DRs). TheDR is the constant portion of the guide, which contains strong secondarystructure (FIG. 3C), which facilitate interaction between a Cas13dprotein and the guide molecule. Each ortholog has a slightly differentDR sequence (e.g., SEQ ID NOS: 129, 130, 131, 132, 133, 134, 135, 136,137, 148, 150, 151, 152, 154, 156, 157, 159, 161, 163, 165, 167, and169). In one example, the gRNA includes at least one DR sequence havingat least 80%, at least 85%, at least 90%, at least 92%, at least 95%, atleast 96%, at least 97%, at least 98%, at least 99%, or 100% sequenceidentity to SEQ ID NO: 129, 130, 131, 132, 133, 134, 135, 136, 137, 148,150, 151, 152, 154, 156, 157, 159, 161, 163, 165, 167, 169, 176, 178,180, 182, 184, 186, 188, 190, 191, 192, 193, 199, 201, 203, 205, 207,209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 228, 230, 232, 234,236, 238, 240, 242, 244, 246, 248, 250, 252, or 254 (such as 1, 2, 3, or4 of such DR sequences).

In one example, a guide sequence includes a constant DR on its 5′-endand a variable spacer on its 3′ end. In one example includes thesequence DR-spacer-DR-spacer. In some examples, the sequence DR-spaceris repeated two or more times, such as at least 3 times or at least 4times. This type of sequence is called a guide array.

Guide molecules generally exist in various states of processing. In oneexample, an unprocessed guide RNA is 36nt of DR followed by 30-32 nt ofspacer. The guide RNA is processed (truncated/modified) by Cas13d itselfor other RNases into the shorter “mature” form. In some embodiments, anunprocessed guide sequence is about, or at least about 30, 35, 40, 45,50, 55, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,or more nucleotides (nt) in length. In some embodiments, a processedguide sequence is about 44 to 60 nt (such as 40, 41, 42, 43, 44, 45, 46,47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,65, 66, 67, 68, 69, or 70 nt). In some embodiments, an unprocessedspacer is about 28-32 nt long (such as 25, 26, 27, 28, 29, 30, 31, 32,33, 34, or 35 nt) while the mature (processed) spacer can be about 10 to30 nt, 10 to 25 nt, 14 to 25 nt, 20 to 22 nt, or 14-30 nt (such as 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, or 35 nt). In some embodiments, an unprocessedDR is about 36 nt (such as 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 41nt), while the processed DR is about 30 nt (such as 25, 26, 27, 28, 29,30, 31, 32, 33, 34, or 35 nt).

The ability of a guide sequence to direct sequence-specific binding of aCRISPR complex to a target RNA may be assessed by any suitable assay.For example, the components of a CRISPR system sufficient to form aCRISPR complex, including the guide sequence to be tested, may beprovided to a host cell having the corresponding target RNA molecule,such as by transfection with vectors encoding the components of theCRISPR sequence, followed by an assessment of preferential cleavagewithin the target sequence. Similarly, cleavage of a target RNA sequencemay be evaluated in a test tube by providing the target RNA, componentsof a CRISPR complex, including the guide sequence to be tested and acontrol guide sequence different from the test guide sequence, andcomparing binding or rate of cleavage at the target RNA between the testand control guide sequence reactions. Other assays are possible, andwill occur to those skilled in the art.

Also provided are vectors, such as a viral vector or plasmid (e.g.,retrovirus, lentivirus, adenovirus, adeno-associated virus, or herpessimplex virus), that includes a guide nucleic acid molecule. Exemplaryvectors are described herein. In some examples, the guide nucleic acidmolecule is operably linked to a promoter or expression control element(examples of which are provided elsewhere in this application). Asdescribed elsewhere herein, such vectors can include other elements,such as a gene encoding a selectable marker, such as an antibiotic, suchas puromycin, hygromycin, or a detectable marker such as GFP or otherfluorophore.

In one example, a plurality of gRNAs are part of an array (which can bepart of a vector, such as a viral vector or plasmid). For example, aguide array including the sequence DR-spacer-DR-spacer-DR-spacer, caninclude three unique unprocessed gRNAs (one for each DR-spacersequence). Once introduced into a cell or cell-free system, the array isprocessed by the Cas13d protein into the three individual mature gRNAs.This allows for multiplexing, e.g. the delivery of multiple gRNAs to acell or system to target multiple target RNAs or multiple positionswithin a single target RNA (or combinations thereof).

D. Vectors that Encode Cas13d and Guide Nucleic Acid Molecules

The disclosure provides vectors, such as plasmids and viral vectors asdescribed elsewhere herein, which include one or more guide moleculecoding sequences (e.g., to permit targeting of one or more RNAmolecules), and one or more Cas13d protein coding sequences. Suchvectors can be used in the methods, compositions, and kits providedherein. Such vectors can include naturally occurring or non-naturallyoccurring nucleotides or ribonucleotides. Such vectors can include asingle promoter operably linked to the guide molecule (which can be partof an array that includes at least two different guide molecules) andthe Cas13d protein coding sequence. Alternatively, the guide molecule(which can be part of an array that includes at least two differentguide molecules) and the Cas13d protein coding sequence can be operablylinked to different promoters.

E. Recombinant Cells and Cell-Free Systems

Cells that include a non-native Cas13d protein, a non-native Cas13dprotein coding sequence, a guide molecule (or coding sequence), orcombinations thereof, are provided. Such recombinant cells can be usedin the methods, compositions, and kits provided herein. Nucleic acidmolecules encoding an Cas13d protein disclosed herein and/or nucleicacid molecules encoding a guide molecule can be introduced into cells togenerate transformed (e.g., recombinant) cells. In some examples, suchcells are generated by introducing one or more non-native Cas13dproteins and one or more guide molecules (e.g., gRNAs) into the cell,for example as a ribonucleoprotein (RNP) complex.

Similarly, cell free systems, such as those generated from lysed cells(or those that include an Cas13d RNP in a test tube or other vessel,into which in vitro transcribed or chemically synthesized target RNAsare added), which include a, Cas13d protein, a Cas13d protein codingsequence, a guide molecule (or coding sequence), or combinationsthereof, are provided. Such cell free systems can be used in themethods, compositions, and kits provided herein. In some examples, oneor more non-native Cas13d proteins and one or more guide molecules(e.g., gRNAs) are added to a cell free system, for example as a RNPcomplex.

Thus, cells and cell-free systems containing an Cas13d protein (such asa protein having at least 80%, at least 85%, at least 90%, at least 92%,at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, or 253) are disclosed. Similarly, cells andcell-free systems containing a guide molecule, such as one having atleast one DR sequence having at least 80%, at least 85%, at least 90%,at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, atleast 99%, or 100% sequence identity to SEQ ID NO: 129, 130, 131, 132,133, 134, 135, 136, or 137, and in some examples also at least onespacer sequence complementary to a target RNA, are provided.

Such recombinant cells (e.g., which can be used to generate a cell-freesystem) can be eukaryotic or prokaryotic. Examples of such cellsinclude, but are not limited to bacteria, archaea, plant, fungal, yeast,insect, and mammalian cells, such as Lactobacillus, Lactococcus,Bacillus (such as B. subtilis), Escherichia (such as E. coli),Clostridium, Saccharomyces or Pichia (such as S. cerevisiae or P.pastoris), Kluyveromyces lactis, Salmonella typhimurium, Drosophilacells, C. elegans cells, Xenopus cells, SF9 cells, C129 cells, 293cells, Neurospora, and immortalized mammalian cell lines (e.g., Helacells, myeloid cell lines, and lymphoid cell lines).

In one example, the cell is a prokaryotic cell, such as a bacterialcell, such as E. coli.

In one example, the cell is a eukaryotic cell, such as a mammalian cell,such as a human cell. In one example, the cell is primary eukaryoticcell, a stem cell, a tumor/cancer cell, a circulating tumor cell (CTC),a blood cell (e.g., T cell, B cell, NK cell, Tregs, etc.), hematopoieticstem cell, specialized immune cell (e.g., tumor-infiltrating lymphocyteor tumor-suppressed lymphocytes), a stromal cell in the tumormicroenvironment (e.g., cancer-associated fibroblasts, etc.) In oneexample, the cell is a brain cell (e.g., neurons, astrocytes, microglia,retinal ganglion cells, rods/cones, etc.) of the central or peripheralnervous system).

In one example, a cell is part of (or obtained from) a biologicalsample, such as a biological specimen containing genomic DNA, RNA (e.g.,mRNA), protein, or combinations thereof, obtained from a subject.Examples include, but are not limited to, peripheral blood, serum,plasma, urine, saliva, sputum, tissue biopsy, fine needle aspirate,surgical specimen, and autopsy material. Such cells can also be used togenerate a cell free system.

In one example the cell (or cell free system) is from a tumor, such as ahematological tumor (e.g., leukemias, including acute leukemias (such asacute lymphocytic leukemia, acute myelocytic leukemia, acute myelogenousleukemia and myeloblastic, promyelocytic, myelomonocytic, monocytic anderythroleukemia), chronic leukemias (such as chronic myelocytic(granulocytic) leukemia, chronic myelogenous leukemia, and chroniclymphocytic leukemia), polycythemia vera, lymphoma, Hodgkin's disease,non-Hodgkin's lymphoma (including low-, intermediate-, and high-grade),multiple myeloma, Waldenström's macroglobulinemia, heavy chain disease,myelodysplastic syndrome, mantle cell lymphoma and myelodysplasia) orsolid tumor (e.g., sarcomas and carcinomas: fibrosarcoma, myxosarcoma,liposarcoma, chondrosarcoma, osteogenic sarcoma, and other sarcomas,synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma,rhabdomyosarcoma, colon carcinoma, lymphoid malignancy, pancreaticcancer, breast cancer, lung cancers, ovarian cancer, prostate cancer,hepatocellular carcinoma, squamous cell carcinoma, basal cell carcinoma,adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma,papillary carcinoma, papillary adenocarcinomas, medullary carcinoma,bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile ductcarcinoma, choriocarcinoma, Wilms' tumor, cervical cancer, testiculartumor, bladder carcinoma, and CNS tumors (such as a glioma, astrocytoma,medulloblastoma, craniopharyogioma, ependymoma, pinealoma,hemangioblastoma, acoustic neuroma, oligodendroglioma, menangioma,melanoma, neuroblastoma and retinoblastoma).

In one example the cell (or cell free system) is obtained from anenvironmental sample, such as a water, soil, or air sample.

F. Compositions & Kits

Compositions and kits that include a Cas13d protein, a Cas13d proteincoding sequence, a guide molecule (or coding sequence), or combinationsthereof, are provided. In one example, the composition or kit includesan RNP complex composed of one or more Cas13d proteins and one or moreguide molecules (e.g., gRNAs). In one example, the composition or kitincludes a vector encoding an Cas13d protein, a guide molecule, or both.In one example, the composition or kit includes a cell, such as abacterial cell or eukaryotic cell, that includes a non-native Cas13dprotein, a non-native Cas13d protein coding sequence, a guide molecule(or coding sequence), or combinations thereof. In one example, thecomposition or kit includes a cell-free system that includes anon-native Cas13d protein, a non-native Cas13d protein coding sequence,a guide molecule (or coding sequence), or combinations thereof.

Such compositions can include a pharmaceutically acceptable carrier(e.g., saline, water, PBS). In some examples, the composition is aliquid, lyophilized powder, or cryopreserved.

In some examples, the kit includes a delivery system (e.g., liposome, aparticle, an exosome, a microvesicle, a viral vector, or a plasmid),and/or a label (e.g., a peptide or antibody that can be conjugatedeither directly to an Cas13d RNP or to a particle containing the Cas1RNP to direct cell type specific uptake/enhance endosomal escape/enableblood-brain barrier crossing etc.). In some examples, the kits furtherinclude cell culture or growth media, such as media appropriate forgrowing bacterial, plant, insect, or mammalian cells.

In some examples, such parts of a kit are in separate containers.

G. Targeting RNA

The disclosed Cas13d proteins (and coding sequences), and guidemolecules (e.g., gRNA and coding sequences) can be used in a CRISPR/Cassystem to target one or more RNA molecules, such as those present in asample (such as a biological sample, environmental sample (e.g., soil,air or water sample), and the like. In one example, the target RNA is acoding RNA. In one example, the target RNA is a nuclear RNA. In otherexamples, the target RNA is non-coding RNA (such as functional RNA,siRNA, microRNA, snRNA, snoRNA, piRNA, scaRNA, tRNA, rRNA, lncRNA, orlincRNA). Such RNA targeting methods can be performed in vitro (such asin cell culture or in a cell-free system), or in vivo (such as in anorganism, embryo, or mammal).

The CRISPR/Cas system provided herein includes two general components:(1) an Cas13d protein or its coding sequence (whose expression can bedriven by a promoter) and (2) a guide nucleic acid molecule, such as RNA(gRNA), which is specific for the target RNA (whose expression can alsobe driven by promoter. When introduced into cells (or to a cell freesystem) for example (1) as Cas13d mRNA and Cas13d gRNA, (2) as part of asingle vector or plasmid or divided into multiple vectors or plasmids,(3) as separate Cas13d protein and guide molecules, or (4) as an RNPcomplex of the Cas13d protein and guide molecule, the guide moleculeguides the Cas13d to the target RNA. If the Cas13d protein has a nativeHEPN domain(s) or is fused to an appropriate effector domain bearingRNase activity, the RNA can be cut. If the Cas13d protein has a mutatedHEPN domain(s), a guide array can be processed into mature gRNAs, butthe target RNA is not cut. Using this system, RNA sequences are easilytargeted, for example edited or detected, optionally with an effectordomain.

1. Introduction of Cas13d Protein Directly into a Cell

In one example, the Cas13d protein is expressed in a recombinant cell,such as E. coli, and purified. The resulting purified Cas13d protein,along with an appropriate guide molecule specific for the target RNA, isthen introduced into a cell or organism where one or more RNAs can betargeted. In some examples, the Cas13d protein and guide nucleic acidmolecule are introduced as separate components into the targetcell/organism. In other examples, the purified Cas13d protein iscomplexed with the guide nucleic acid (e.g., gRNA), and thisribonucleoprotein (RNP) complex is introduced into target cells (e.g.,using transfection or injection). In some examples, the Cas13d proteinand guide molecule are injected into an embryo (such as a human, mouse,zebrafish, or Xenopus embryo).

Once the Cas13d protein and guide nucleic acid molecule are in the cell,one or more RNAs can be targeted.

2. Expression of Cas13d from Nucleic Acids

In one example, the Cas13d protein is expressed from a nucleic acidmolecule in a cell containing a target RNA, for example an RNA to bedetected or modified. In some such examples, the Cas13d protein isexpressed from a vector, such as a viral vector or plasmid introducedinto a cell or into a cell-free system. This results in the productionof the Cas13d protein in the cell, organism, or system. In addition,these nucleic acid molecules can be co-expressed in thecell/organism/system with the guide nucleic acid molecule (e.g., gRNA)specific for the target RNA.

In one example, multiple plasmids or vectors are used for RNA targeting.The nucleic acid molecule encoding the Cas13d can be provided forexample on one vector or plasmid, and the guide nucleic acid molecule(e.g., gRNA) on another plasmid or vector. Multiple plasmids or viralvectors can be mixed and introduced into cells (or a cell free system)at the same time, or separately.

In some examples, multiple nucleic acid molecules are expressed from asingle vector or plasmid. For example, a single vector can include thenucleic acid molecule encoding the Cas13d, and a separate vector caninclude the guide molecule.

In some examples a plurality of different guide molecules (e.g., gRNAs),one for each target (such as 1, 2, 3, 4, 5, or 10 different targets),are present on a single array and/or vector. In one example, the methodincludes delivering a plurality of gRNAs (such as at least 2, at least3, at least 4, at least 5, at least 10, at least 20, or at least 50different gRNAs), which are part of an array (which can be part of avector, such as a viral vector or plasmid). Once introduced into a cellor cell-free system, the array is processed by the Cas13d protein intothe individual mature gRNAs.

The nucleic acid molecules expressed from the vector can be under thecontrol of a promoter and optionally contain selection markers (such asantibiotic resistance).

In some examples, the protein and guide molecules are expressed by anembryo (such as a zebrafish or Xenopus embryo). The Cas13d protein canbe expressed from injected plasmid DNA, injected mRNA, or stablyintegrated copies into the animal genome. The gRNA can be directlyinjected or expressed from a vector or stably integrated copies into theanimal genome.

3. Targets

One or more RNAs can be targeted by the disclosed methods, such as atleast 1, at least 2, at least 3, at least 4 or at least 5 different RNAsin a cell, cell-free system, or organism, such as 1, 2, 3, 4, 5, 6, 7,8, 9 or 10 different RNAs. In one example, the RNA is associated with adisease such as cystic fibrosis, Huntington's disease, Tay-Sachs,Fragile X syndrome, Fragile X-associated tremor/ataxia syndrome,Duchenne muscular dystrophy, myotonic dystrophy, spinal muscularatrophy, spinocerebellar ataxia, or familial ALS. In one example, theRNA is associated with cancer (e.g., a cancer of the lung, breast,colon, liver, pancreas, prostate, bone, brain, skin (e.g., melanoma), orkidney). Examples of target RNAs include, but are not limited to thoseassociated with cancer (e.g., BCR-ABL, Ras, Raf, p53, BRCA1, BRCA2,CXCR4, beta-catenin, HER2, and CDK4).

In one example, the RNA is associated with viral infection, such asinfection by a positive-strand RNA viruses, such as Picornaviruses (suchas Aphthoviridae [for example foot-and-mouth-disease virus (FMDV)]),Cardioviridae; Enteroviridae (such as Coxsackie viruses, Echoviruses,Enteroviruses, and Polioviruses); Rhinoviridae (Rhinoviruses));Hepataviridae (Hepatitis A viruses); Togaviruses (examples of whichinclude rubella; alphaviruses (such as Western equine encephalitisvirus, Eastern equine encephalitis virus, and Venezuelan equineencephalitis virus)); Flaviviruses (examples of which include Denguevirus, West Nile virus, and Japanese encephalitis virus); Calciviridae(which includes Norovirus and Sapovirus); or Coronaviruses (examples ofwhich include SARS coronaviruses, such as the Urbani strain), or anegative-strand RNA virus, such as Orthomyxyoviruses (such as theinfluenza virus), Rhabdoviruses (such as Rabies virus), andParamyxoviruses (examples of which include measles virus, respiratorysyncytial virus, or parainfluenza viruses), or a DNA viral infection(such as infection by Herpesviruses (such as Varicella-zoster virus, forexample the Oka strain; cytomegalovirus; and Herpes simplex virus (HSV)types 1 and 2), Adenoviruses (such as Adenovirus type 1 and Adenovirustype 41), Poxviruses (such as Vaccinia virus), or Parvoviruses (such asParvovirus B19).

In one example, the RNA is associated with a bacterial infection orproperty of a bacterial infection, such as bacterial resistance,persistence, or antibiotic resistance. Detection of these RNAs can beused for diagnostic methods, while editing these RNAs in cell-based orcell-free systems can be used for therapeutic methods.

4. Methods of Detecting RNA

In one example, the method of targeting an RNA results in detecting,visualizing, or labeling a target RNA. For example, by using at leastone Cas13d protein with a mutated HEPN domain (e.g., SEQ ID NO: 2 or 4),at least one gRNA containing at least one spacer sequence specific forthe target RNA, and an effector module, the target RNA will berecognized by Cas13d but will not be cut or nicked while the effectormodule becomes activated. In some examples, such a method is used todetect a target RNA. Such a method can be used in a cell or cell freesystem to determine if a target RNA is present, such as in a tumor cell.In some examples, the cell or cell free system is obtained from a tissuesample, blood sample or saliva sample.

In one example, the method of detecting RNA comprises of an Cas13dprotein fused to a fluorescent protein or other detectable label alongwith a gRNA containing a spacer sequence specific for the target RNA.Binding of Cas13d to the target RNA can be visualized by microscopy orother methods of imaging. In another example, RNA aptamer sequences canbe appended to or inserted within the gRNA molecule, such as MS2, PP7,Qβ, and other aptamers. The introduction of proteins that specificallybind to these aptamers, e.g. the MS2 phage coat protein, fused to afluorescent protein or other detectable label can be used to detect thetarget RNA because the Cas13d-gRNA-target RNA complex will be labeledvia the aptamer interaction.

In another example, the method of detecting RNA is a transcriptionalsensor (e.g., as part of a synthetic circuit) for diagnostics ortherapeutics. For example, the transcriptional sensor can be comprisedof at least one Cas13d protein with a mutated HEPN domain (e.g., SEQ IDNO: 2 or 4), at least one gRNA containing at least one spacer sequencespecific for the target RNA, and an effector module such as anoptionally split fluorescent protein or probe; an optionally splitluminescent protein or probe; an optionally split enzyme that catalyzesa detectable reaction such as ubiquitin or TEV protease; aFRET-compatible protein pair; one or more transcription factor(s) fusedto Cas13d via cleavable linkers; a split intein that trans-splices aprotein to restore its function such as a transcription factor; akinase-substrate pair that activates upon phosphorylation; one, two, ormore monomers that activate upon dimerization or multimerization; or oneor more proteins that induce conformational and functional change uponinteraction. In one example, the spatial proximity of two or more Cas13dproteins and gRNAs due to binding a particular transcript would activatethe effector module, resulting in a detectable signal or detectableactivity in the cell.

For example, the transcriptional sensor could allow a cancer-specifictranscript, inflammation-specific transcript, disease-specifictranscript, or cell state-specific transcript to be detected. Asynthetic circuit containing a Cas13d-based system that is able to senseparticular transcripts could encode conditional logic, e.g. requiringtarget detection to up- or down-regulate a gene for therapeuticapplication.

In one example, the method results in a detectable agent being bound tothe target RNA, which can be detected. For example, two separate Cas13dfusion proteins that each include part of a fluorophore (e.g., GFP), andtwo different gRNAs with different spacer sequences that target regionsof an RNA in close proximity, can be used. When the two parts bind tothe target RNA in proximity, the two parts of the fluorophore form acomplete fluorophore, thereby generating a detectable signal.

In one example, the method results in RNA detection, for example bytriggering a response such as expression of a second gene, modificationof a protein, translocation of a protein or RNA to a different location,induction of cell death via suicide gene, induction of cellproliferation, induction of a transgene that enables a secondaryfunction, induction of a permanent change in DNA sequence to enablestoring a memory of past transcriptional events, or altering the RNA toenable pulldown.

In one embodiment, two halves of a transcription factor could be linkedto two separate Cas13d via a split intein system. The Cas13d proteinsare provided with two different gRNAs with different spacer sequencesthat target regions of an RNA in close proximity Upon binding to thetarget RNA in proximity, the split inteins trans-splice a reconstitutedtranscription factor (TF) so that it can translocate to the nucleus andturn on a target gene or cluster of target genes. In one example, thetarget gene could be an endogenous gene in the cell. In another example,the target gene could be a transgene expressed on a vector or introducedthrough genetic engineering, such as a fluorescent protein or toxin.

5. Methods of Detecting RNA in Cell-Free Systems

In one example, the method of detecting a target RNA in a cell-freesystem results in a detectable label or enzyme activity. For example, byusing at least one Cas13d protein (e.g., SEQ ID NO: 3, 42, 62, 70, 82,83, and 92), at least one gRNA containing at least one spacer sequencespecific for the target RNA, and a detectable label, the target RNA willbe recognized by Cas13d. The binding of the target RNA by Cas13dtriggers its RNase activity, which can lead to the cleavage of thetarget RNA as well as the detectable label.

In one example, the detectable label is an RNA linked to a fluorescentprobe and quencher. The intact detectable RNA links the fluorescentprobe and quencher, suppressing fluorescence. Upon cleavage by Cas13d ofthe detectable RNA, the fluorescent probe is released from the quencherand displays fluorescent activity. Such a method can be used todetermine if a target RNA is present in a lysed cell sample, lysedtissue sample, blood sample, saliva sample, environmental sample (suchas a water, soil, or air sample), or other lysed cell or cell-freesample. Such a method can also be used to detect a pathogen, such as avirus or bacteria, or diagnose a disease state, such as a cancer.

In one example, the detection of the target RNA aids in the diagnosis ofdisease and/or pathological state, or the existence of a viral orbacterial infection. For example, Cas13d-mediated detection ofnon-coding RNAs such as PCA3 can be used to diagnose prostate cancer ifdetected in patient urine. In another example, Cas13d-mediated detectionof the lncRNA-AA174084, which is a biomarker of gastric cancer, can beused to diagnose gastric cancer.

6. Methods of Editing Target RNA

In one example, the method of targeting an RNA results in editing thesequence of a target RNA. For example, by using an Cas13d protein with anon-mutated HEPN domain (e.g., SEQ ID NOS: 1, 3, 42, 62, 70, 82, 83, and92), and a gRNA containing at least one a spacer sequence specific forthe target RNA, the target RNA can be cut or nicked at a preciselocation. In some examples, such a method is used to decrease expressionof a target RNA, which will decrease translation of the correspondingprotein. Such a method can be used in a cell where increased expressionof an RNA is not desired. In one example, the RNA is associated with adisease such as cystic fibrosis, Huntington's disease, Tay-Sachs,Fragile X syndrome, Fragile X-associated tremor/ataxia syndrome,muscular dystrophy, myotonic dystrophy, spinal muscular atrophy,spinocerebellar ataxia, or familial ALS. In another example, the RNA isassociated with cancer (e.g., a cancer of the lung, breast, colon,liver, pancreas, prostate, bone, brain, skin (e.g., melanoma), orkidney). Examples of target RNAs include, but are not limited to thoseassociated with cancer (e.g., PD-L1, BCR-ABL, Ras, Raf, p53, BRCA1,BRCA2, CXCR4, beta-catenin, HER2, and CDK4). Editing such target RNAscan have a therapeutic effect.

In another example, the RNA is expressed in an immune cell. The targetRNA could, for example, code for a protein leading to the repression ofa desirable immune response, such as infiltration of a tumor. Knock-downof such an RNA could enable progression of such a desirable immuneresponse (e.g., PD1, CTLA4, LAG3, TIM3). In another example, the targetRNA encodes a protein resulting in the undesirable activation of animmune response, for example in the context of an autoimmune diseasesuch as multiple sclerosis, Crohn's disease, lupus, or rheumatoidarthritis.

In one example, targeting the target RNA allows for decreasingexpression of the target protein encoded by the RNA. For example, byusing an Cas13d fusion protein with a mutated HEPN domain (e.g., SEQ IDNO: 2 or 4) and a translational repression domain (such as Pumilio orFBF PUF proteins, deadenylases, CAF1, Argonaute proteins, and others),and a guide RNA containing at least one spacer sequence specific for thetarget RNA, expression of a target RNA can be decreased.

In some examples, Cas13d can be fused to a ribonuclease (such as a PINendonuclease domain, an NYN domain, an SMR domain from SOT1, or an RNasedomain from Staphylococcal nuclease) or a domain that affects RNAstability (such as tristetraprolin or domains from UPF1, EXOSC5, andSTAU1).

In another example, RNA aptamer sequences can be appended to or insertedwithin the gRNA molecule, such as MS2, PP7, Qβ, and other aptamers.Proteins that specifically bind to these aptamers, e.g. the MS2 phagecoat protein, can be fused to a translational repression domain, aribonuclease, or a domain that affects RNA stability. Thisaptamer-effector domain fusion can be used to target the target RNAbecause the Cas13d and gRNA complex will guide the aptamerprotein-effector domain in proximity to the target RNA.

Such a method can be used in a cell where increased expression of an RNAis not desired, such as when an expressed RNA is associated with adisease such as cystic fibrosis, Huntington's disease, Tay-Sachs,Fragile X syndrome, Fragile X-associated tremor/ataxia syndrome,muscular dystrophy, myotonic dystrophy, spinal muscular atrophy,spinocerebellar ataxia, or familial ALS. In another example, the targetRNA is associated with cancer (e.g., a cancer of the lung, breast,colon, liver, pancreas, prostate, bone, brain, skin (e.g., melanoma), orkidney). Examples of target RNAs include, but are not limited to thoseassociated with cancer (e.g., PD-L1, BCR-ABL, Ras, Raf, p53, BRCA1,BRCA2, CXCR4, beta-catenin, HER2, and CDK4). Editing such target RNAswould have a therapeutic effect.

In another example, the RNA is expressed in an immune cell. The targetRNA could, for example, code for a protein leading to the repression ofa desirable immune response, such as infiltration of a tumor. Knock-downof such an RNA could enable progression of such a desirable immuneresponse (e.g., PD1, CTLA4, LAG3, TIM3). In another example, the targetRNA could encode a protein resulting in the undesirable activation of animmune response, for example in the context of an autoimmune diseasesuch as multiple sclerosis, Crohn's disease, lupus, or rheumatoidarthritis.

In one example, targeting the target RNA allows for activating orincreasing expression of the target RNA. For example, by using an Cas13dfusion protein with a mutated HEPN domain (e.g., SEQ ID NO: 2 or 4) anda translational activation domain (such as eIF4E and other translationinitiation factors, a domain of the yeast poly(A)-binding protein orGLD2), and a guide RNA containing at least one a spacer sequencespecific for the target RNA, expression of a target RNA can beincreased. Aptamer introduction into the gRNA with a cognateaptamer-binding protein fused to a translational activation domain canalso be used. In one example, RNA aptamer sequences are appended to orinserted within the gRNA molecule, such as MS2, PP7, Qβ, and otheraptamers. The introduction of proteins that specifically bind to theseaptamers, e.g. the MS2 phage coat protein, fused to a translationalactivation domain can be used to target the target RNA because theCas13d and gRNA complex will bring the aptamer protein-translationalactivation domain in proximity to the target RNA.

In some examples, such a method is used to increase the activity orexpression of a target RNA, which will increase translation of thecorresponding protein (if the RNA is a coding RNA). Such a method can beused in a cell where increased expression of an RNA is desired, such asa heterozygous genetic disease or disorders caused by copy numbervariation. Increasing translation of a desired protein product could betherapeutic in nature.

In another example, increasing the expression of a target RNA (such asCyclin B1) can render the target cell (such as cancers) more sensitiveto drugs (such as chemotherapeutic agents).

In one example, targeting the target RNA allows for one or more RNA basesubstitutions, RNA base edits, RNA base deletions, RNA base insertions,or combinations thereof, in the target RNA. In some examples, the Cas13dprotein with a mutated HEPN domain is associated, either via directfusion or a gRNA-aptamer modification, an effector domain that allowsbase edits (such as a cytidine deaminase, PPR protein, adenosinedeaminase, ADAR family protein, or APOBEC family protein). In someexamples, such a method is used to modify an RNA sequence, edit an RNAmutation, or modify an RNA transcript (e.g., gene therapy), for exampleto treat diseases such as ALS and melanoma or genetic disorders causedby undesired splice sites, such as Leber congenital amaurosis.

In one example, targeting the target RNA allows for methylating thetarget RNA. Some examples may use a chimeric Cas13d protein with amutated HEPN domain (e.g., SEQ ID NO: 2 or 4) associated either viadirect fusion or a gRNA-aptamer modification with a methylation domain(e.g., m6A), and a guide RNA containing at least one a spacer sequencespecific for the target RNA. In some examples, such a method is used tocombat aberrant RNA demethylation.

In one example, such a method is used modify the methylation levels ofpluripotency transcripts such as NANOG or KLF4 for example to decreasetheir stability in breast cancer cells, which can suppress theacquisition of breast cancer stem cell phenotypes that are associatedwith increased proliferation and cancer stem cell formation.

In one example, targeting the target RNA allows for demethylating thetarget RNA. Some examples can use Cas13d protein with a mutated HEPNdomain (e.g., SEQ ID NO: 2 or 4), a guide RNA containing at least one aspacer sequence specific for the target RNA, and a demethylation domain(e.g., human alkylation repair homolog 5 or Alkbh5). The demethylationdomain can be associated either via direct fusion to the Cas13d proteinor via a gRNA-aptamer modification. In some examples, such a method isused to reverse aberrant RNA methylation, for example to treat myeloidleukemia by decreasing m6A levels.

In one example, targeting the target RNA allows for binding to thetarget RNA. For example, by using a Cas13d protein with a mutated HEPNdomain (e.g., SEQ ID NO: 2 or 4) and a guide RNA containing at least onea spacer sequence specific for the target RNA, molecules can be bound ortethered to a target RNA. In some examples, such a method is used tocapture the target RNA (e.g., immuno-precipitation). This can be used aspart of a kit to identify the proteins interacting with a specific RNAtranscript. In one example, an epitope tagged Cas13d (e.g. FLAG, HA,biotin, HALO tag) can be targeted to specific target RNAs andcross-linked via fixation (e.g. with paraformaldehyde orglutaraldehyde). Immunoprecipitation of Cas13d with anepitope-recognizing antibody allows for the identification ofco-immunoprecipitated proteins via Western blot or mass spectrometry.

In another example, Cas13d can be fused to a biotin ligase (such asBirA) or a peroxidase (such as APEX2) in order to biotinylate proteinsthat interact with the target RNA. Labeled proteins can then be pulleddown with streptavidin beads followed by mass spectrometry or Westernblot.

In some examples, biotinylated Cas13d could be targeted to ribosomal RNAsequences with a gRNA. Streptavidin bead-mediated pulldown can be usedto deplete rRNA for RNA sequencing library preparation.

In one example, targeting the target RNA allows for masking the targetRNA. For example, by using an Cas13d protein with a mutated or intactHEPN domain and a guide RNA containing at least one a spacer sequencespecific for the target RNA, a target RNA can be masked from RNA-bindingproteins or RNA-binding elements such as miRNAs.

In some examples, the Cas13d can be used to mask RNA binding sites fromRNA-binding proteins (RBPs). In another example, Cas13d can mask miRNAbinding sites. For example, the liver-specific miR-122 forms a complexwith Hepatitis C viral RNA which protects it from degradation. AHEPN-active Cas13d protein could be targeted to the miRNA-122 bindingsite on the viral RNA to synergistically combat HCV infections bysimultaneously reversing miRNA-122-mediated protection and directlydegrading HCV RNA. In some examples, such a method is used to preserveor protect the target RNA molecule, for example to protect the targetRNA from degradation. For example, by targeting AU-rich elements in the3′ UTR of a target gene, a HEPN-mutated Cas13d can block binding ofRNA-binding proteins such as tristetraprolin (TTR) or AUF1, which leadto degradation of the target transcript.

In one example, targeting the target RNA allows for changing splicing ofthe target RNA. Both the direct binding of splice acceptor and/or donorsites as well as splice effector domains can be used to manipulatesplicing. For example, by using an Cas13d protein with a mutated HEPNdomain (e.g., SEQ ID NO: 2 or 4), a guide RNA containing at least one aspacer sequence specific for the target RNA, and optionally an effectordomain that affects splicing (such as the RS-rich domain of SRSF1, theGly-rich domain of hnRNP A1, the alanine-rich motif of RBM4, or theproline-rich motif of DAZAP1), alternative splicing of the RNA can beachieved.

In some examples, such a method is used for exon inclusion, for exampleto include exon 2 of acid alpha-glucosidase (GAA) to treat Pompe diseaseor to include exon 7 of SMN2 to treat spinal muscular atrophy (SMA). Insome examples, such a method is used for exon exclusion, for example torestore the reading frame of dystrophin to treat Duchenne musculardystrophy or to shift the splicing of the Bcl-x pre-mRNA from theantiapoptotic long isoform to the proapoptotic short isoform to treatcancer.

In some examples, the method uses the Cas13d protein with a mutated HEPNdomain to mask splice acceptor or donor sites, for example to createneoantigens to make cold tumors hot. By affecting the splicing ofcertain target pre-mRNAs, this method can generate novel exon-exonjunctions that can lead to the creation of neo-epitopes in cancer cells.This can make a cancer cell vulnerable to the immune system due to thedisplay of unnatural antigens. In other examples, this method can beused to dynamically manipulate isoform ratios or to restore readingframe of a protein (e.g., dystrophin for Duchenne's muscular dystrophy).

In one example, targeting the target RNA allows for controllingtranscript trafficking of the target RNA. For example, by using anCas13d fusion protein with a mutated HEPN domain (e.g., SEQ ID NO: 2 or4) and a subcellular localization signal or export sequence, a guide RNAcontaining at least one a spacer sequence specific for the target RNA.In some examples, such a method is used to traffic the target RNAmolecule to a particular organelle or cytosolic compartment, or evenexport the target RNA transcript, for example to endosomes forextracellular release

In another example, the method can affect RNA trafficking. For example,the zipcode binding protein ZBP1 specifically recognizes an RNA sequence5′-CGGAC(C/A-CA-C/U) that leads to localization of certain transcriptsto the leading edge of fibroblasts. By masking or manipulatingparticular RNA zipcodes or regulatory sequences from recognition byregulatory protein complexes, this method can affect RNA localization ortrafficking within a cell.

In one example, the target RNA is a nuclearly localized RNA. Forexample, by using an Cas13d protein with a non-mutated HEPN domain(e.g., SEQ ID NOS: 1, 3, 42, 62, 70, 82, 83, and 92) fused to a nuclearlocalization signal and a guide RNA containing at least one a spacersequence specific for the target nuclear RNA, the nuclearly localizedRNA can be targeted and degraded. In some examples, such a method isused to degrade the target nuclear RNA molecule, for example toknock-down a non-coding nuclear RNA such as HOTAIR, which is associatedwith metastatic progression in breast cancer.

In one example, the target RNA is viral RNA or transcript of a DNAvirus. For example, an Cas13d protein with a non-mutated HEPN domain(e.g., SEQ ID NOS: 1, 3, 42, 62, 70, 82, 83, and 92) and a guide RNAcontaining at least one spacer sequence specific for the target RNA canbe used. In some examples, such a method is used to treat an RNA viralinfection (such as infection by a positive-strand RNA viruses, such asPicornaviruses (such as Aphthoviridae [for examplefoot-and-mouth-disease virus (FMDV)]), Cardioviridae; Enteroviridae(such as Coxsackie viruses, Echoviruses, Enteroviruses, andPolioviruses); Rhinoviridae (Rhinoviruses)); Hepataviridae (Hepatitis Aviruses); Togaviruses (examples of which include rubella; alphaviruses(such as Western equine encephalitis virus, Eastern equine encephalitisvirus, and Venezuelan equine encephalitis virus)); Flaviviruses(examples of which include Dengue virus, West Nile virus, and Japaneseencephalitis virus); Calciviridae (which includes Norovirus andSapovirus); or Coronaviruses (examples of which include SARScoronaviruses, such as the Urbani strain), or a negative-strand RNAvirus, such as Orthomyxyoviruses (such as the influenza virus),Rhabdoviruses (such as Rabies virus), and Paramyxoviruses (examples ofwhich include measles virus, respiratory syncytial virus, orparainfluenza viruses), or a DNA viral infection (such as infection byHerpesviruses (such as Varicella-zoster virus, for example the Okastrain; cytomegalovirus; and Herpes simplex virus (HSV) types 1 and 2),Adenoviruses (such as Adenovirus type 1 and Adenovirus type 41),Poxviruses (such as Vaccinia virus), or Parvoviruses (such as ParvovirusB19), for example by cutting the viral RNA or transcript of a DNA virus.Thus, such methods can be used as an RNA-based antiviral orantimicrobial.

Example 1 Materials and Methods

This example describes the materials and methods used to obtain theresults shown in Examples 2-7.

Cell Culture of Human Embryonic Kidney (HEK) Cell Line 293FT

Human embryonic kidney (HEK) cell line 293FT (Thermo Fisher) wasmaintained in DMEM (4.5 g/L glucose), supplemented with 10% FBS (GE LifeSciences) and 10 mM HEPES at 37° C. with 5% CO₂. Upon reaching 80-90%confluency, cells were dissociated using TrypLE Express (LifeTechnologies) and passaged at a ratio of 1:2.

Cell Culture of Human Bone Osteosarcoma Epithelial Cell Line U2OS

Human bone osteosarcoma epithelial U2OS were maintained in DMEM (4.5 g/Lglucose) supplemented with 10% FBS and 10 mM HEPES at 37° C. with 5%CO₂. Cells were passaged at a 1:3 ratio upon reaching 70% confluence.This cell line was not authenticated.

Maintenance of Induced Pluripotent Stem Cells and NeuronalDifferentiation

Stable human iPSC lines containing the FTDP-17 IVS10+16 mutation or age-and sex-matched control lines were obtained from the laboratory ofFen-Biao Gao (Biswas et al., 2016). Briefly, cells obtained from onemale patient with the MAPT IVS10+16 mutation and two separate lines fromone male control patient were reprogrammed into hiPSCs (Almeida et al.,2012). iPSCs were transduced with lentivirus containing adoxycycline-inducible Ngn2 cassette. Lentiviral plasmids were a giftfrom S. Schafer and F. Gage. iPSCs were then passaged with Accutase andplated into a Matrigel-coated 6-well plate with mTESR media containingROCK inhibitor Y-27632 (10 μM, Cayman) at 500,000 cells per well. On day1, media was changed with mTESR. On day 2, media was changed to mTESRcontaining doxycycline (2 μg/ml, Sigma) to induce Ngn2 expression. Onday 3, culture media was replaced with Neural Induction media (NIM,DMEM/F12 (Life Technologies) containing BSA (0.1 mg/ml, Sigma),apo-transferrin (0.1 mg/ml, Sigma), putrescine (16 μg/ml, Sigma),progesterone (0.0625 μg/ml, Sigma), sodium selenite (0.0104 μg/ml,Sigma), insulin (5 μg/ml, Roche), BDNF (10 ng/ml, Peprotech), 5B431542(10 μM, Cayman), LDN-193189 (0.1 μM, Sigma), laminin (2 μg/ml, LifeTechnologies), doxycycline (2 μg/ml, Sigma) and puromycin (LifeTechnologies)). NIM media was changed daily. Following 3 days ofpuromycin selection, immature neuronal cells were passaged with Accumax(Innovative Cell Technologies) and plated onto 96-well plates coatedwith poly-D-lysine and Matrigel in Neural Maturation media (NMM; 1:1Neurobasal/DMEM (Life Technologies) containing B27 (Life Technologies),BDNF (10 ng/ml, Peprotech), N-Acetyl cysteine (Sigma), laminin (2 μg/ml,Life Technologies), dbcAMP (49 μg/ml, Sigma) and doxycycline (2 μg/ml,Sigma). Media was replaced the next day (day 7) with NMM containing AraC(2 μg/ml, Sigma) to eliminate any remaining non-differentiated cells. Onday 8, AraC was removed and astrocytes were plated on top of neurons tosupport neuron cultures in NMM containing hbEGF (5 ng/ml, Peprotech).Cells were transduced with AAV on day 10 and assayed on day 24.

Computational Pipeline for Cas13d Identification

We obtained whole genome, chromosome, and scaffold-level prokaryoticgenome assemblies from NCBI Genome in June 2016 and comparedCRISPRfinder, PILER-CR, and CRT for identifying CRISPR repeats. The 20kilobase flanking regions around each putative CRISPR repeat wasextracted to identify nearby proteins and predicted proteins usingPython. Candidate Cas proteins were required to be >750 aa in length andwithin 5 proteins of the repeat array, and extracted CRISPR loci werefiltered out if they contained Cas genes associated with known CRISPRsystems such as types I-III CRISPR. Putative effectors were clusteredinto families via all-by-all BLASTp analysis followed by single-linkagehierarchical clustering where a bit score of at least 60 was requiredfor cluster assignment. Each cluster of at least 2 proteins wassubjected to BLAST search against the NCBI non-redundant (nr) proteindatabase, requiring a bit score >200 to assign similarity. Theco-occurrence of homologous proteins in each expanded cluster to aCRISPR array was analyzed and required to be >70%. Protein families weresorted by average amino acid length and multiple sequence alignment foreach cluster was performed using Clustal Omega and the Geneious alignerwith a Blosum62 cost matrix. The RxxxxH HEPN motif was identified in theCas13d family on the basis of this alignment. TBLASTN was performed onall predicted Cas13d effectors against public metagenome whole genomeshotgun sequences without predicted open reading frames (ORFs). TheCas13d family was regularly updated via monthly BLAST search on genomeand metagenome databases to identify any newly deposited sequences. Newfull-length homologs and homologous fragments were aligned using ClustalOmega and clustered using PhyML 3.2. CRISPRDetect was used to predictthe direction of direct repeats in the Cas13d array and DR foldpredictions were performed using the Andronescu 2007 RNA energy model at37° C. Sequence logos for Cas13d direct repeats were generated usingGeneious 10.

Protein Expression and Purification

Recombinant Cas13d proteins were PCR amplified from genomic DNAextractions of cultured isolates or metagenomic samples and cloned intoa pET-based vector with an N-terminal His-MBP fusion and TEV proteasecleavage site. The resulting plasmids were transformed intoRosetta2(DE3) cells (Novagen), induced with 200 μM IPTG at OD₆₀₀ 0.5,and grown for 20 hours at 18° C. Cells were then pelleted,freeze-thawed, and resuspended in Lysis Buffer (50 mM HEPES, 500 mMNaCl, 2 mM MgCl2, 20 mM Imidazole, 1% v/v Triton X-100, 1 mM DTT)supplemented with 1× protease inhibitor tablets, 1 mg/mL lysozyme, 2.5U/mL Turbo DNase (Life Technologies), and 2.5 U/mL salt active nuclease(Sigma Aldrich). Lysed samples were then sonicated and clarified viacentrifugation (18,000×g for 1 hour at 4° C.), filtered with 0.45 μMPVDF filter and incubated with 50 mL of Ni-NTA Superflow resin (Qiagen)per 10 L of original bacterial culture for 1 hour. The bead-lysatemixture was applied to a chromatography column, washed with 5 columnvolumes of Lysis Buffer, and 3 column volumes of Elution Buffer (50 mMHEPES, 500 mM NaCl, 300 mM Imidazole, 0.01% v/v Triton X-100, 10%glycerol, 1 mM DTT). The samples were then dialyzed overnight into TEVCleavage Buffer (50 mM Tris-HCl, 250 mM KCl, 7.5% v/v glycerol, 0.2 mMTCEP, 0.8 mM DTT, TEV protease) before cation exchange (HiTrap SP, GELife Sciences) and gel filtration (Superdex 200 16/600, GE LifeSciences). Purified, eluted protein fractions were pooled and frozen at4 mg/mL in Protein Storage Buffer (50 mM Tris-HCl, 1M NaCl, 10%glycerol, 2 mM DTT).

Preparation of Guide and Target RNAs

Oligonucleotides carrying the T7 promoter and appropriate downstreamsequence were synthesized (IDT) and annealed with an antisense T7 oligofor crRNAs and PCR-amplified for target and array templates. Homopolymertarget RNAs were synthesized by Synthego. The oligo anneal and PCRtemplates were in vitro transcribed with the Hiscribe T7 High Yield RNASynthesis kit (New England Biolabs) at 31° C. for 12 hours. For labeledtargets, fluorescently labelled aminoallyl-UTP atto 680 (JenaBiosciences) was additionally added at 2 mM. Guide RNAs were purifiedwith RNA-grade Agencourt AMPure XP beads (Beckman Coulter) and arraysand targets were purified with MEGAclear Transcription Clean-Up Kit(Thermo Fisher) and frozen at −80° C. For ssDNA and dsDNA targets,corresponding oligonucleotide sequences were synthesized (IDT) andeither gel purified, or PCR amplified and then subsequently gel purifiedrespectively.

Biochemical Cleavage Reactions

Purified EsCas13d protein and guide RNA were mixed (unless otherwiseindicated) at 2:1 molar ratio in RNA Cleavage Buffer (25 mM Tris pH 7.5,15 mM Tris pH 7.0, 1 mM DTT, 6 mM MgCl2). The reaction was prepared onice and incubated at 37° C. for 15 minutes prior to the addition oftarget at 1:2 molar ratio relative to EsCas13d. The reaction wassubsequently incubated at 37° C. for 45 minutes and quenched with 1 μLof enzyme stop solution (10 mg/mL Proteinase K, 4M Urea, 80 mM EDTA, 20mM Tris pH 8.0) at 37° C. for 15 minutes. The reaction was thendenatured with 2×RNA loading buffer (2×: 13 mM Ficoll, 8M Urea, 25 mMEDTA), at 85° C. for 10 minutes, and separated on a 10% TBE-Urea gel(Life Technologies). Gels containing labeled targets were visualized onthe Odyssey Imager (Li-Cor); unlabeled array or target cleavage gelswere stained with SYBR Gold prior to imaging via Gel Doc EZ system(Bio-Rad).

Transient Transfection of Human Cell Lines

Engineered Cas13 coding sequences were cloned into a standardizedplasmid expression backbone containing an EF1a promoter and preparedusing the Nucleobond Xtra Midi EF Kit (Machery Nagel) according to themanufacturer's protocol. NLS-LwaCas13a-msfGFP and PspCas13b-NES-HIV werePCR amplified from Addgene #103854, and #103862, respectively, a giftfrom Feng Zhang. Cas13d pre-gRNAs and gRNAs were cloned into a minimalbackbone containing a U6 promoter. shRNAs and guides for LwaCas13a werecloned into the same backbone and position matched to theircorresponding guide RNA at the 3′ of the target sequence. Matched gRNAsfor PspCas13b were moved to the closest 5′-G nucleotide.

For transient transfection, HEK 293FT cells were plated at a density of20,000 cells per well in a 96-well plate and transfected at >90%confluence with 200 ng of Cas13 expression plasmid and 200 ng of gRNAexpression plasmid using Lipofectamine 2000 (Life Technologies)according to the manufacturer's protocol. Transfected cells wereharvested 48-72 hours post-transfection for flow cytometry, geneexpression analysis, or other downstream processing.

For reporter assays, HEK 293FT cells were transfected in 96-well formatwith 192 ng of Cas13d expression plasmid, 192 ng of guide expressionplasmid, and 12 ng of mCherry expression plasmid with Lipofectamine 2000(Life Technologies). Cells were harvested after 48 hours and analyzed byflow cytometry.

U2OS cells were plated at a density of 20,000 cells per well in a96-well plate and transfected at >90% confluence with 100 ng of Cas13dexpression plasmid using Lipofectamine 3000 (Life Technologies)according to the manufacturer's protocol and processed forimmunocytochemistry after 48 h.

Flow Cytometry

Cells were dissociated 48 hours post-transfection with TrypLE Expressand resuspended in FACS Buffer (1×DPBS^(−/−), 0.2% BSA, 2 mM EDTA). Flowcytometry was performed in 96-well plate format using a MACSQuant VYB(Miltenyi Biotec) and analyzed using FlowJo 10. RG6 was a gift fromThomas Cooper (Addgene plasmid #80167) and modified to replace EGFP withmTagBFP2. All represented samples were assayed with three biologicalreplicates. In the mCherry reporter assay, data is representative of atleast 20,000 gated events per condition. In the splicing reporter assay,data is representative of at least 2,500 gated events per condition.

Gene Expression Analysis

Cells were lysed 48 hours post-transfection with DTT-supplemented RLTbuffer and total RNA was extracted using RNeasy Mini Plus columns(Qiagen). 200 ng of total RNA was then reverse transcribed using randomhexamer primers and Revertaid Reverse Transcriptase (Thermo Fisher) at25° C. for 10 min, 37° C. for 60 min, and 95° C. for 5 min followed byqPCR using 2×Taqman Fast Advanced Master Mix (Life Technologies) andTaqman probes for GAPDH and the target gene as appropriate (LifeTechnologies and IDT). Taqman probe and primer sets were generallyselected to amplify cDNA across the Cas13 or shRNA target site positionto prevent detection of cleaved transcript fragments (see Table S4 ofKonermann et al., Cell 173:1-12, 2018, herein incorporated by referencein its entirety). qPCR was carried out in 5 μL multiplexed reactions and384-well format using the LightCycler 480 Instrument II (Roche).Fold-change was calculated relative to GFP-transfected vehicle controlsusing the ddCt method. One-way or two-way ANOVA with multiple comparisoncorrection was used to assess statistical significance of transcriptchanges using Prism 7.

Immunohistochemistry

For immunohistochemical analysis, U2OS cells were cultured on 96-welloptically clear plates (Greiner Bio-One), transfected as previouslydescribed, then fixed in 4% PFA (Electron Microscopy Sciences) dilutedin PBS (Gibco) and washed with 0.3M glycine (Sigma) in PBS to quenchPFA. Samples were blocked and permeabilized in a PBS solution containing8% donkey serum (Jackson ImmunoResearch), 8% goat serum (Cell SignalingTechnologies), and 0.3% Triton-X 100 (Sigma) for one hour, followed byprimary antibody incubation in 1% BSA (Fisher Bioreagents), 1% goatserum, and 0.25% Triton-X overnight at 4° C. Samples were washed 3 timeswith PBS containing 0.1% BSA and 0.1% Triton-X 100 before incubatingwith fluorophore-conjugated secondary antibodies in PBS with 0.05%Triton-X 100 and 1% BSA at room temperature for one hour. Cells werewashed with PBS with 0.1% Triton-X, stained with DAPI, and then coveredwith Mounting Media (Ibidi) before imaging. Primary antibody, HA-Tag 6E2(Cell Signaling, 2367), was used at a 1:100 dilution as permanufacturer's instructions. Secondary antibodies used were goatanti-mouse IgG1-Alexa-Fluor 647 (Thermo Fisher, A21240) and Anti-MouseIgG1 CF 633 (Sigma, SAB4600335). Confocal images were taken using aZeiss Airyscan LSM 880 followed by image processing in Zen 2.3 (Zeiss).

Bacterial Small RNA Sequencing and Analysis

E. coli DH5α cells were transformed with pACYC184 carrying theCRISPR-Cas13d locus derived from an uncultured Ruminococcus sp. strain.Cells were harvested in stationary phase, rinsed in PBS, resuspended inTRIzol (Life Technologies), transferred to Lysing Matrix B tubescontaining 0.1 mm silica beads (MP Biomedicals), and homogenized on aBead Mill 24 (Fisher Scientific) for three 30-second cycles. Total RNAwas isolated by phenol-chloroform extraction, then purified using theDirectZol Miniprep Kit (Zymo Research). RNA quality was assessed on anAgilent 2200 Tapestation followed by Turbo DNase treatment (Ambion).Total RNA was treated with T4 Polynucleotide Kinase (NEB) andrRNA-depleted using the Ribo-Zero rRNA Removal Kit for bacteria(Illumina). RNA was treated with RNA 5′ polyphosphatase, poly(A)-tailedwith E. coli poly(A) polymerase, and ligated with 5′ RNA sequencingadapters using T4 RNA ligase 1 (NEB). cDNA was generated via reversetranscription using an oligo-dT primer and M-MLV RT/RNase Block(AffinityScript, Agilent) followed by PCR amplification and barcoding.Resulting libraries were sequenced on Illumina MiSeq, demultiplexedusing custom Python scripts, and aligned to the Cas13d CRISPR locususing Bowtie 2. Alignments were visualized with Geneious.

Ngn2 Lentivirus Preparation

Low passage HEK 293FT cells were transfected with Polyethylenimine Max(PEI, Polysciences) and Ngn2 target plasmid plus pMDG.2 and psPAX2packaging plasmids (a gift from Didier Trono, Addgene #12259 and #12260)in DMEM+10% FBS media during plating. The following day, media waschanged to serum-free chemically defined minimal medium (Ultraculturesupplemented with Glutamax, Lonza). Viral supernatant was harvested 48 hlater, clarified through a 0.45 micron PVDF filter (Millipore) andconcentrated using ultracentrifugation.

AAV Preparation

Low passage HEK 293FT cells were transfected with Polyethylenimine Max(PEI, Polysciences) and AAV target plasmid plus AAV1 serotype andpAdDeltaF6 helper packaging plasmids (UPenn Vector Core) in DMEM+10% FBSmedia during plating. The following day, 60% of the media was changed tochemically defined minimal medium (Ultraculture supplemented withGlutamax, Lonza). 48 h later, AAV-containing supernatant was harvestedand clarified through a 0.45 μm PVDF filter (Millipore) and concentratedusing precipitation by polyethylene glycol (PEG virus precipitation kit#K904, Biovision) following the manufacturer's protocol.

RNA-Seq Library Preparation and Sequencing

48 h after transfection, total RNA was extracted from 293FT cells usingthe RNeasy Plus Mini kit from Qiagen. Stranded mRNA libraries wereprepared using the NEBNext II Ultra Directional RNA Library Prep Kitfrom New England Biolabs (Cat #E7760S) and sequenced on an IlluminaNextSeq500 with 42 nt paired end reads. ˜15M total reads weredemultiplexed per condition.

RNA-Seq Analysis

Sequenced reads were quality-tested using FASTQC and aligned to the hg19human genome using the 2.5.1b STAR aligner (Dobin et al., 2013). Mappingwas carried out using default parameters (up to 10 mismatches per read,and up to 9 multi-mapping locations per read). The genome index wasconstructed using the gene annotation supplied with the hg19 IlluminaiGenomes collection (Illumina) and sjdbOverhang value of 100. Uniquelymapped reads were quantified across all gene exons using thetop-expressed isoform as proxy for gene expression with the HOMERanalysis suite (Heinz et al., 2010), and differential gene expressionwas carried out with DESeq2 v 1.14.1 (Love et al., 2014) usingtriplicates to compute within-group dispersion and contrasts to comparebetween targeting and non-targeting conditions. Significantdifferentially expressed genes were defined as having a false discoveryrate (FDR) <0.01 and a log 2 fold change >0.75. Volcano plots weregenerated in R 33.2 using included plotting libraries and the alpha( )color function from the scales 0.5.0 package.

Statistics

All values are reported as mean±SD or mean±SEM as indicated in theappropriate figure legends. For comparing two groups, a one-tailedstudent's t-test was used and statistical significance was determinedusing the Holm-Sidak method with alpha=0.05. A one-way ANOVA with Tukeymultiple hypothesis correction was used to assess significance betweenmore than two groups. Two-way ANOVA was used when comparing across twofactors (i.e., RNA targeting modality and guide position) and adjustedfor multiple hypothesis correction by Sidak's multiple comparisons test.For comparing groups that were found to not meet the assumption of anormal distribution by a D'Agostino and Pearson normality test, thenon-parametric Friedman test with Dunn's multiple comparison adjustmentwas performed. PRISM 7.0 was used for all statistical analysis. Samplesizes were not determined a priori. At least three biological replicateswere used for each experiment, as indicated specifically in each figure.

Sequencing data reported herein can be found in the NCBI Gene ExpressionOmnibus under GEO Series accession number GSE108519.

Additional details on the materials and methods used, such as sequences(e.g., Tables 51 to S5), can be found in Konermann et al., Cell173:1-12, 2018, herein incorporated by reference in its entirety.

Example 2 Computational Identification of a Type VI-Like CasRibonuclease Family

This example describes methods used to identify previously undetected oruncharacterized RNA-targeting CRISPR-Cas systems by developing acomputational pipeline for class 2 CRISPR-Cas loci, which require only asingle nuclease for CRISPR interference such as Cas9, Cas12a (formerlyCpf1), or Cas13a (formerly C2c2) (Makarova et al., 2015; Shmakov et al.,2015). To improve upon previous strategies for bioinformatic mining ofCRISPR systems, which focus on discovering sets of conserved Cas genesinvolved in spacer acquisition (Shmakov et al., 2015), the minimalrequirements for a CRISPR locus to be the presence of a CRISPR repeatarray and a nearby effector nuclease were defined. Using the CRISPRarray as a search anchor, all prokaryotic genome assemblies andscaffolds were obtained from the NCBI WGS database and adaptedalgorithms for de novo CRISPR array detection (Bland et al., 2007;Edgar, 2007; Grissa et al., 2007) to identify 21,175 putative CRISPRrepeat arrays (FIG. 1A).

Up to 20 kilobases (kb) of genomic DNA sequence flanking each CRISPRarray was extracted to identify predicted protein-coding genes in theimmediate vicinity. Candidate loci containing signature genes of knownclass 1 and class 2 CRISPR-Cas systems such as Cas3 or Cas9 wereexcluded from further analysis, except for Cas12a and Cas13a to judgethe ability of the pipeline to detect and cluster these known class 2effector families. To identify new class 2 Cas effectors, it wasrequired that candidate proteins be >750 residues in length and within 5protein-coding genes of the repeat array, as large proteins closelyassociated with CRISPR repeats are key characteristics of known singleeffectors. The resulting proteins were classified into 408 putativeprotein families using single-linkage hierarchical clustering based onhomology.

To discard protein clusters that reside in close proximity to CRISPRarrays due to chance or overall abundance in the genome, additionalhomologous proteins to each cluster were identified from the NCBInon-redundant protein database and their proximity to a CRISPR arraydetermined. Reasoning that true Cas genes would have a highco-occurrence rate with CRISPR repeats, >70% of the proteins for eachexpanded cluster were required to exist within kb of a CRISPR repeat.These remaining protein families were analyzed for nuclease domains andmotifs.

Among the candidates, which include the recently described Cas13b system(Smargon et al., 2017), a family of uncharacterized putative class 2CRISPR-Cas systems encoding a candidate CRISPR-associated ribonucleasecontaining 2 predicted HEPN ribonuclease motifs (Anantharaman et al.,2013) were identified (FIG. 2A). Importantly, they are among thesmallest class 2 CRISPR effectors described to date (˜930 aa). The TypeVI CRISPR-Cas13 superfamily is exemplified by sequence-divergent,single-effector signature nucleases and the presence of two HEPNdomains. Other than these two RxxxxH HEPN motifs (FIG. 3A), thecandidate effectors have no significant sequence similarity topreviously described Cas13 enzymes, so his family of putative CRISPRribonucleases is designated as Type VI Cas13d, or Type VI-D (FIG. 3B).

CRISPR-Cas13d systems are derived from gut-resident microbes, so wesought to expand the Cas13d family via alignment to metagenomic contigsfrom recent large-scale microbiome sequencing efforts. Comparison ofCas13d proteins against public metagenome sequences without predictedopen reading frames (ORFs) identified additional full-length systems aswell as multiple effector and array fragments that cluster in severaldistinct branches (FIG. 1B). To generate full-length Cas13d orthologproteins and loci from the different branches of the Cas13d proteinfamily, genomic DNA samples we43 obtained from associated assemblies andperformed targeted Sanger sequencing to fill in gaps due to incompletesequencing coverage, such as for the metagenomic ortholog ‘Anaerobicdigester metagenome’ (Adm) (Treu et al., 2016).

Cas13d CRISPR loci are largely clustered within benign, Gram-positivegut bacteria of the genus Ruminococcus, and exhibit a surprisingdiversity of CRISPR locus architectures (FIG. 2A). With the exception ofthe metagenomic AdmCas13d system, Cas13d systems lack the key spaceracquisition protein Cas1 (Yosef et al., 2012) within their CRISPR locus,highlighting the utility of a class 2 CRISPR discovery pipeline withoutCas1 or Cast gene requirements. Cas13d direct repeats (DRs) are highlyconserved in length and predicted secondary structure (FIG. 3C), with a36 nt length, an 8-10 nt stem with A/U-rich loop, and a 5′-AAAAC motifat the 3′ end of the direct repeat (FIG. 3D). This conserved 5′-AAAACmotif has been shown to be specifically recognized by a type II Cast/2spacer acquisition complex (Wright and Doudna, 2016). In fact, Cas1 canbe found in relative proximity to some Cas13d systems (within 10-30 kbfor P1E0 and Rfx) while the remaining Cas13d-containing bacteria containCas1 elsewhere in their genomes, likely as part of another CRISPR locus.

Example 3 CRISPR-Cas13d Possesses Dual RNase Activities

To demonstrate that the Cas13d repeat array is transcribed and processedinto CRISPR guide RNAs (gRNA), the Cas13d CRISPR locus was cloned froman uncultured Ruminococcus sp. sample (Ur) into a bacterial expressionplasmid. CRISPR systems tend to form self-contained operons with thenecessary regulatory sequences for independent expression, facilitatingheterologous expression in E. coli (Gasiunas et al., 2012). RNAsequencing (Heidrich et al., 2015) revealed processing of the array into˜52nt mature gRNAs, with a 30 nt 5′ direct repeat followed by a variable3′ spacer that ranged from 14-26 nt in length (FIG. 2B).

To characterize Cas13d properties in vitro, Eubacterium siraeum Cas13dprotein (EsCas13d) was purified based on its robust recombinantexpression in E. coli (FIGS. 4A-4C) and found that EsCas13d was solelysufficient to process its matching CRISPR array into constituent guideswithout additional helper ribonucleases (FIG. 2C, Table S1 of Konermannet al., “Transcriptome Engineering with RNA-Targeting Type VI-D CRISPREffectors,” Cell 173:1-12, 2018, herein incorporated by reference in itsentirety), a property shared by some class 2 CRISPR-Cas systems(East-Seletsky et al., 2016; Fonfara et al., 2016; Smargon et al.,2017). Furthermore, inactivating the positively charged catalyticresidues of the HEPN motifs (Anantharaman et al., 2013) (dCas13d: R295A,H300A, R849A, H854A) did not affect array processing, indicating adistinct RNase activity dictating gRNA biogenesis analogous to Cas13a(East-Seletsky et al., 2016; Liu et al., 2017).

Cas effector proteins typically form a binary complex with mature gRNAto generate an RNA-guided surveillance ribonucleoprotein capable ofcleaving foreign nucleic acids for immune defense (van der Oost et al.,2014). To assess if Cas13d has programmable RNA targeting activity asindicated by the presence of two HEPN motifs, EsCas13d protein waspaired with an array or a mature gRNA along with a cognate invitro-transcribed target. Based on the RNA sequencing results, a maturegRNA containing a 30 nt direct repeat and an intermediate spacer lengthof 22 nt was selected (nucleotides 6-36 of SEQ ID NO: 129, followed by22 bases complementary to the RNA target).

Cas13d was able to efficiently cleave the complementary target ssRNAwith both the unprocessed array and mature gRNA in a guide-sequencedependent manner, while non-matching spacer sequences abolished Cas13dactivity (FIG. 5A). Substitution with dCas13d or the addition of EDTA tothe cleavage reaction also abolished guide-dependent RNA targeting,indicating that Cas13d targeting is HEPN- and Mg²⁺-dependent (FIG. 5B).To determine the minimal spacer length for efficient Cas13d targeting, aseries of spacer truncations ranging from the unprocessed 30 nt lengthdown to 10 nt were generated (FIG. 6A). Cleavage activity droppedsignificantly below a 21 nt spacer length, confirming the choice of a 22nt spacer (FIG. 6B).

RNA-targeting class 2 CRISPR systems have been proposed to act assensors of foreign RNAs (Abudayyeh et al., 2016; East-Seletsky et al.,2016), where general RNase activity of the effector nuclease istriggered by a guide-matching target. To assay for a similar property inCas13d, RNase activity of the binary EsCas13d:gRNA complex was monitoredin the presence of a matching RNA target. It was observed that EsCas13dcan be activated by target RNA to cleave bystander RNA targets (FIG.3C), albeit inefficiently relative to its activity on the complementaryssRNA target. Bystander cleavage is guide sequence- and HEPN-dependent,as the presence of non-matching bystander target alone was insufficientto induce cleavage while substitution of dCas13d or addition of EDTAabolished activity. These results indicate that bystander RNase activitymay be a general property of RNA-targeting class 2 systems in CRISPRadaptive bacterial immunity (FIG. 3D).

To assess the generalizability of Cas13d reprogramming, twelve guidestiling a complementary RNA target were generated and efficient cleavagein all cases was observed (FIG. 7A). Cas13d was unable to cleave a ssDNA(FIG. 6C) or dsDNA (FIG. 6D) version of the ssRNA target, indicatingthat Cas13d is an RNA-specific nuclease. Further, RNA target cleavagedid not depend on the protospacer flanking sequence (PFS) (FIG. 7A) incontrast to other RNA-targeting class 2 systems, which require a 3′-H(Abudayyeh et al., 2016) or a double-sided, DR-proximal 5′-D and 3′-NANor NNA (Smargon et al., 2017). Although a slight bias against an adeninePFS was initially observed (FIG. 6E), varying the target PFS base with aconstant guide sequence resulted in no significant differences (P=0.768)in targeting efficiency (FIG. 6F).

While DNA-targeting class 2 CRISPR systems (Gasiunas et al., 2012; Jineket al., 2012; Zetsche et al., 2015) and some RNA-targeting class 1systems tend to cleave at defined positions relative to the target-guideduplex (Samai et al., 2015; Zhang et al., 2016), the Cas13d cleavagepattern varies for different targets (FIGS. 5A, 5C, 6H) and remainsremarkably similar despite the guide sequence position (FIG. 7A). Thisindicates that Cas13d may preferentially cleave specific sequences orstructurally accessible regions in the target RNA. Cas13d was testedactivity on targets containing variable homopolymer repeats in the loopregion of a hairpin or as a linear single-stranded repeat. EsCas13dexhibited significant preference for uracil bases in both targetstructures, with lower but detectable activity at all other bases (FIG.7B).

Cas enzymes are found in nearly all archaea and about half of bacteria(Hsu et al., 2014; van der Oost et al., 2014), spanning a wide range ofenvironmental temperatures. To determine the optimal temperature rangefor Cas13d activity, a spectrum of cleavage temperature conditions from16-62° C. was tested and observed maximal activity in the 24-41° C.range (FIG. 6G, 6H). This temperature range is compatible with a widerange of prokaryotic and eukaryotic hosts, indicating Cas13d can beadapted for RNA targeting in different cells and organisms.

Example 4 Cell-Based Activity Screen of Engineered Orthologs

The Cas13d nuclease was used as a flexible tool for programmable RNAtargeting in mammalian cells. CRISPR orthologs from distinct bacterialspecies commonly exhibit variable activity (Abudayyeh et al., 2017;East-Seletsky et al., 2017), especially upon heterologous expression inhuman cells (Ran et al., 2015; Zetsche et al., 2017). Highly activeCas13d orthologs were identified in a eukaryotic cell-based mCherryreporter screen.

By synthesizing human codon-optimized versions of 7 orthologs fromdistinct branches within the Cas13d family (FIG. 1B), mammalianexpression plasmids carrying the catalytically active and HEPN-inactiveproteins were generated. Each protein was then optionally fused to N-and C-terminal nuclear localization signals (NLS). These Cas13d effectordesigns were HA-tagged and paired with two distinct guide RNAarchitectures, either with a 30 nt spacer flanked by two direct repeatsequences to mimic an unprocessed guide RNA (pre-gRNA) or a 30 nt directrepeat with 22 nt spacer (gRNA) predicted to mimic mature guide RNAs(FIG. 8A). For each guide design, four distinct spacer sequencescomplementary to the mCherry transcript were then pooled to minimizepotential spacer-dependent variability in targeting efficiency. Theability of Cas13d to knockdown mCherry protein levels was determined ina human embryonic kidney (HEK) 293FT cell-based reporter assay.

48 hours post-transfection, flow cytometry indicated that RfxCas13d andAdmCas13d efficiently knocked down mCherry protein levels by up to 92%and 87% (P<0.0003), respectively, relative to a non-targeting controlguide (FIG. 8B). In contrast, EsCas13d along with RaCas13d and RffCas13dexhibited limited activity in human cells. Furthermore, none of theHEPN-inactive Rfx-dCas13d constructs significantly affected mCherryfluorescence, indicating HEPN-dependent knockdown (P>0.43 for allcases). Robust nuclear translocation of the Rfx and AdmCas13d NLS fusionconstructs was observed via immunocytochemistry, while the wild-typeeffectors remain primarily extra-nuclear (FIG. 8C).

Proceeding with RfxCas13d and AdmCas13d as lead candidates, we nextcompared their ability to knockdown endogenous transcripts. To determinethe optimal ortholog and guide architecture, the capability of Rfx andAdmCas13d construct variants to target β-1,4-N-acetyl-galactosaminyltransferase 1 (B4GALNT1) transcripts was systematically assayed. In eachcondition, four guides containing distinct spacer sequences tiling theB4GALNT1 transcript were pooled. The RfxCas13d-NLS fusion targetedB4GALNT1 more efficiently than wild-type RfxCas13d and both variants ofAdmCas13d, with both the gRNA and pre-gRNA mediating potent knockdown(˜82%, P<0.0001) (FIG. 8D). Cas13d-NLS from Ruminococcus flavefaciensstrain XPD3002 was therefore selected for the remaining experiments(CasRx).

Example 5 Programmable RNA Knockdown in Human Cells with CasRx

Because Cas13d is capable of processing its own CRISPR array, thisproperty was leveraged for the simultaneous delivery of multipletargeting guides in a simple single-vector system (FIG. 9A). Arraysencoding four spacers that each tile the transcripts of mRNAs (B4GALNT1and ANXA4) or nuclear localized lncRNAs (HOTT1P and MALAT1) consistentlyfacilitated robust (>90%) RNA knockdown by CasRx (P<0.0001) (FIG. 9B).

CasRx was compared to more established technologies for transcriptknockdown or repression, by comparing CasRx-mediated RNA interference todCas9-mediated CRISPR interference (Gilbert et al., 2014; Gilbert etal., 2013) and spacer sequence-matched shRNAs via transient transfection(FIG. 9C). For CRISPRi-based repression, the most potent dCas9 guide forB4GALNT1 from previous reports was analyzed (Gilbert et al., 2014;Zalatan et al., 2015). Across 3 endogenous transcripts, CasRxoutperformed shRNAs (11/11) and CRISPRi (4/4) in each case (FIG. 9D),exhibiting a median knockdown of 96% compared to 65% for shRNA and 53%for CRISPRi after 48 hours. In addition, knockdown by CasRx was comparedto two recently described Cas13a and Cas13b effectors (Abudayyeh et al.,2017; Cox et al., 2017) (FIG. 10A). Across three genes and eight guideRNAs, CasRx mediated significantly greater transcript knockdown thanboth LwaCas13a-msfGFP-NLS and PspCas13b-NES (median: 97% compared to 80%and 66% respectively, P<0.0001) (FIG. 10B).

RNAi has been widely used to disrupt any gene of interest due to acombination of simple re-targeting principles, scalable synthesis,knockdown potency, and ease of reagent delivery. However, widespreadoff-target transcript silencing has been a consistent concern (Jacksonet al., 2003; Sigoillot et al., 2012), possibly due to the entry of RNAireagents into the endogenous miRNA pathway (Doench et al., 2003; Smithet al., 2017). Consistent with these reports, upon RNA sequencing ofhuman cells transfected with a B4GALNT1-targeting shRNA, widespreadoff-target transcriptional changes relative to a non-targeting shRNAwere observed (>500 significant off-target changes, P<0.01, FIGS. 9E,9G). In contrast, transcriptome profiling of spacer-matched CasRx guideRNAs revealed no significant off-target changes other than the targetedtranscript (FIG. 9F). This indicates that the moderate bystandercleavage observed in vitro (FIG. 5C) may not result in observableoff-target transcriptome perturbation in mammalian cells. A similarpattern was observed when targeting ANXA4 (FIGS. 11A-11B), with over 900significant off-target changes resulting from shRNA targeting comparedto zero with CasRx (FIG. 9G).

To confirm that CasRx interference is broadly applicable, a panel of 11additional genes with diverse roles in cancer, cell signaling, andepigenetic regulation were selected and 3 guides per gene were screened.CasRx consistently mediated high levels of transcript knockdown acrossgenes with a median reduction of 96% (FIG. 9H). Each tested guidemediated at least 80% knockdown, underscoring the consistency of theCasRx system for RNA interference.

Example 6 Splice Isoform Engineering with dCasRx

The experiments on RNA targeting with CasRx revealed that target RNA andprotein knockdown is dependent on the catalytic activity of the HEPNdomains (FIGS. 8B, 5B). The same guide sequences mediating efficientknockdown with CasRx failed to significantly reduce mCherry levels whenpaired with catalytically inactive dCasRx (FIG. 8B), indicating thattargeting of dCasRx to the coding portion of mRNA does not necessarilyperturb protein translation. This observation indicated the possibilityof utilizing dCasRx for targeting of specific coding and non-codingelements within a transcript to study and manipulate RNA. To validatethis concept, the utility of the dCasRx system was expanded by creatinga splice effector.

Alternative splicing is generally regulated by the interaction ofcis-acting elements in the pre-mRNA with positive or negativetrans-acting splicing factors, which can mediate exon inclusion orexclusion (Matera and Wang, 2014; Wang et al., 2015). It was reasonedthat dCasRx binding to such motifs may be sufficient for targetedisoform perturbation. For proof-of-concept, distinct splice elementswere identified in a bichromatic splicing reporter containing DsRedupstream of mTagBFP2 in two different reading frames following analternatively spliced exon (Orengo et al., 2006) (FIG. 12A). Inclusionor exclusion of this second exon toggles the reading frame and resultingfluorescence, facilitating quantitative readout of splicing patterns byflow cytometry. To mediate exon skipping, four guide RNAs were designedto target the intronic branchpoint nucleotide, splice acceptor site,putative exonic splice enhancer, and splice donor of exon 2.

One widespread family of negative splice factors are the highlyconserved heterogeneous nuclear ribonucleoproteins (hnRNPs), whichtypically inhibit exon inclusion via a C-terminal, glycine-rich domain(Wang et al., 2015). The splicing reporter was targeted with dCasRx andengineered fusions to the Gly-rich C-terminal domain of hnRNPa1, one ofthe most abundant hnRNP family members (FIG. 12B).

Guide position appears to be a major determinant of the efficiency ofengineered exon skipping. While each guide position mediated asignificant increase in exon exclusion (P<0.0001 in all cases) relativeto the non-targeting guide, targeting the splice acceptor resulted inthe most potent exon exclusion (increase from 8% basal skipping to 65%for dCasRx alone and 75% with hnRNPa1 fusion). By comparison,dLwaCas13a-msfGFP-NLS mediated significantly lower levels of exonskipping across all four positions (19% skipping for splice acceptorguide) (FIGS. 10C and 10D, P<0.0001).

Targeting all 4 positions simultaneously with a CRISPR array achievedhigher levels of exon skipping than individual guides alone (81% fordCasRx and 85% for hnRNPa1 fusion, P<0.006 compared to SA guide) (FIG.12B). These results indicate that dCasRx allows for tuning of isoformratios through varying guide placement and suggest that it can beleveraged as an efficient RNA binding module in human cells fortargeting and manipulation of specific RNA elements.

Example 7 Viral Delivery of dCasRx to a Neuronal Model of FrontotemporalDementia

The Cas13d family averages 930 amino acids in length, in contrast toCas9 (1100 aa to ˜1400 aa depending on subtype, with compact outlierssuch as CjCas9 or SaCas9), Cas13a (1250 aa), Cas13b (1150 aa), andCas13c (1120 aa) (FIG. 3B) (Chylinski et al., 2013; Cox et al., 2017;Hsu et al., 2014; Kim et al., 2017; Shmakov et al., 2015; Smargon etal., 2017). Although adeno-associated virus (AAV) is a versatile vehiclefor transgene delivery and gene therapy due to its broad range of capsidserotypes, low levels of insertional mutagenesis, and lack of apparentpathogenicity, its limited packaging capacity (˜4.7 kb) makes itchallenging to effectively deliver many single effector CRISPR enzymes(Abudayyeh et al., 2017; Ran et al., 2015; Swiech et al., 2015). Theremarkably small size of Cas13d effectors render them uniquely suitedfor all-in-one AAV delivery with a CRISPR array, an optional effectordomain, and requisite expression or regulatory elements (FIG. 12C).

Frontotemporal Dementia with Parkinsonism linked to Chromosome 17(FTDP-17) is an autosomal dominant major neurodegenerative diseasecaused by diverse point mutations in MAPT, the gene encoding for tau.Tau exists as two major isoforms in human neurons, 4R and 3R, which aredistinguished by the presence or absence of tau exon 10 and thus contain4 or 3 microtubule binding domains. The balance of these two isoforms isgenerally perturbed in FTDP-17 as well as other tauopathies, driving theprogression of neurodegeneration (Boeve and Hutton, 2008). Some forms ofFTD are caused by mutations in the intron following MAPT exon whichdisrupt an intronic splice silencer and elevate the expression of 4R tau(Kar et al., 2005), thereby inducing pathological changes (Schoch etal., 2016).

It was reasoned that dCasRx targeted to MAPT exon 10 could induce exonexclusion to alleviate dysregulated 4R/3R tau ratios. Patient-derivedhuman induced pluripotent stem cells (hiPSCs) were differentiated intocortical neurons via Neurogenin-2 directed differentiation for 2 weeks(Zhang et al., 2013). Postmitotic neurons were then transduced with AAV1carrying dCasRx (FIG. 12D) paired with a repeat array containing 3spacers that target the exon 10 splice acceptor and two putative exonicsplice enhancers (FIG. 12E). dCasRx-mediated exon exclusion was able toreduce the relative 4R/3R tau ratio by nearly 50% relative to a BFPvehicle control (FIG. 12F) and to a level similar to unaffected controlneurons, demonstrating that CasRx can be exploited for transcriptionalmodulation in primary cell types via AAV delivery.

Example 8 RNA Targeting in Human Cells Using Cas13d

RNA can be targeted in human cells using the active Cas13d nuclease. Asa proof of concept, human U-2 OS bone osteosarcoma cells were stablyintegrated with an mCherry reporter and transfected with plasmidsencoding human codon optimized Cas13d and guide RNAs targeting themCherry transcript (FIG. 13 ).

Cas13d proteins were also fused with N- and C-terminal NLS sequences(SPKKKRKVEAS, SEQ ID NO: 256, for the N-terminal NLS and GPKKKRKVAAA,SEQ ID NO: 258, C-terminal NLS) to understand if nuclear localizationcan affect mCherry knockdown (these are denoted the 2x NLS constructs).Guide RNAs were either provided in a vector with a U6 promoter operablylinked to a 36 nt DR-30 nt spacer-36 nt DR sequence, which mimics theunprocessed CRISPR guide array (denoted DR36), or to a 30nt DR-22ntspacer to mimic the processed, mature gRNA (denoted gRNA). The DR36construct is presumed to be processed by Cas13d into mature gRNAs withinthe cell. The spacer sequences within the DR36 or gRNA molecules wereeither complementary to the mCherry target RNA (on-target mCherry) orcomputationally optimized to avoid complementarity to mCherry or anyendogenous human transcript (non-targeting mCherry).

mCherry knockdown was quantified via flow cytometry and normalized to atransfection control. The non-targeting mCherry guides do not affectmCherry protein levels via flow cytometry, presumably because themCherry transcripts are not targeted. However, the on-target mCherryguides paired with 4 different Cas13d orthologues exhibited significantmCherry knockdown (FIG. 13 ). “XPD” refers to Ruminococcus flavefaciensXPD3002 Cas13d (SEQ ID NO: 92); “P1E0” refers to Gut metagenome P1E0Cas13d (SEQ ID NO: 83); “AnDig” refers to Anaerobic digester gutmetagenome Cas13d (SEQ ID NO: 42); “Uncultured” refers to unculturedRuminococcus sp. Cas13d.

Example 9 In Vivo RNA Targeting Using Cas13d

RNA can be targeted in mouse models of cancer. To observe which cells inthe mouse are expressing EGFR, a guide RNA is designed that includes oneor more spacer regions complementary to mouse EGFR and is combined withCas13d having a mutated HEPN domain (such as SEQ ID NO: 2 or 4), and abiotin label. The gRNA and Cas13d coding sequence are cloned into aviral vector (such as a lentivirus) which is used to infect the mice bytail vein injection at a titer to insure 100% infection rates. Afluorescent streptavidin label is administered to the mice. Cellsexpressing EGFR are visualized and detected with the appropriateexcitation frequency for the fluorescent label. Alternatively, Cas13d isdelivered in its active form in vivo to mediate target knock-down.

Example 10 Treatment of Cancer

Human subjects with histologically confirmed stage 1, EGFR+ breastcancer can be treated with the disclosed methods. Each subject isadministered a complex comprising an active Cas13d or a Cas13d proteinmutated in the HEPN domain (such as SEQ ID NO: 2 or 4), a guide RNAtargeting EGFR, and a toxin, after receiving lumpectomy surgery. Treatedindividuals are monitored for breast cancer recurrence.

Example 11 Treatment of HIV Infection

Human subjects with HIV infection can be treated with the disclosedmethods. Each subject is administered a construct comprising an activeCas13d or a Cas13d protein mutated in the HEPN domain (such as SEQ IDNO: 2 or 4), a guide RNA targeting HIV Nef protein, and a toxin. Treatedindividuals are monitored for HIV progression.

Example 12 Treatment of Huntington's Disease

Human subjects with Huntington's Disease can be treated with thedisclosed methods. Each individual is administered a constructcomprising Cas13d, a guide RNA targeting the Huntington mutation.Treated individuals are monitored for disease progression.

Example 13 Alternative Splicing Using Cas13d

Cas13d splice effectors can be used for therapeutic protein restoration(for example that results from a mutation or deletion), gene knockdownvia frameshift induction, tuning or restoring a desired isoform ratio,or inducing a desired dominant splice isoform (FIG. 14 ). Alternativesplicing is generally regulated by the interaction of cis-actingelements in the pre-mRNA with positive or negative trans-acting splicingfactors, which can mediate exon inclusion or exclusion. dCas13d andCas13d, with optional fusion to positive or negative splicing factors,can be used as splicing effectors that target to said cis-actingelements in the pre-mRNA to manipulate splicing. Such elements caninclude exonic splicing enhancers or suppressors, intronic splicingenhancers or suppressors, splice acceptor and splice donor sites, andmore generally protein- or RNA-interacting motifs or elements on thatparticular pre-mRNA, mRNA, or other RNA species, such as a non-codingRNA, tRNA, miRNA, and the like.

Additionally, that the effects of Cas13d-based splice effectors can beguide position-dependent. This can be exploited to perturb or discoverparticular motifs or sites in an RNA transcript such as protein-bindingsites, via steric hindrance, blocking, recruitment, or effector-mediatedinteraction. For example, the interaction between non-coding RNAs andparticular chromatin remodeling complexes can be perturbed. Access ofthe ribosomal binding site and other elements can be blocked in a 5′ or3′ UTR (or to recruit appropriate effector domains) to decrease,increase, or otherwise manipulate translation.

Targeting or tiling Cas13d guides along a pre-mRNA can be used todiscover or map new cis-acting elements such as intronic or exonicsplice enhancers. This has been exploited in a therapeutic context inthe case of the dystrophin gene for optimal antisense oligonucleotidepositioning and can also be used for optimal Cas13d positioning. Thiscan be also used to map, mask, or otherwise perturb RNA zipcodes orother cis-acting elements to affect trafficking and localization,chromatin remodeling, polyadenylation, RNA stability and half-life, orlevels of nonsense-mediated decay.

In one example, targeting an RNA allows for changing splicing of thetarget RNA. Both the direct binding of splice acceptor and/or donorsites as well as splice effector domains can be used to manipulatesplicing. For example, by using a dCas13d protein with a mutated HEPNdomain (e.g., SEQ ID NO: 2 or 4), a guide RNA containing at least one aspacer sequence specific for the target RNA, and optionally an effectordomain that affects splicing (such as the RS-rich domain of SRSF1, theGly-rich domain of hnRNPA1, the alanine-rich motif of RBM4, or theproline-rich motif of DAZAP1), alternative splicing of the RNA can beachieved.

In some examples, such a method is used for exon inclusion, for exampleto include exon 2 of acid alpha-glucosidase (GAA) to treat Pompe diseaseor to include exon 7 of SMN2 to treat spinal muscular atrophy (SMA). Insome examples, such a method is used for exon exclusion, for example torestore the reading frame of dystrophin to treat Duchenne musculardystrophy, to shift the splicing of the Bcl-x pre-mRNA from theantiapoptotic long isoform to the proapoptotic short isoform to treatcancer, to shift the splicing of the MAPT transcript to affect ratios of3R and 4R tau, or to manipulate the splicing of the lamin A transcriptin the case of Hutchinson-Gilford progeria syndrome or other geneticdiseases of aging.

In some examples, the method uses an Cas13d protein, optionally with amutated HEPN domain, to mask splice acceptor or donor sites, for exampleto create neoantigens to make cold tumors hot. By affecting the splicingof certain target pre-mRNAs, this method can generate novel exon-exonjunctions that can lead to the creation of neo-epitopes in cancer cells.This can make a cancer cell vulnerable to the immune system due to thedisplay of unnatural antigens. In other examples, this method can beused to dynamically manipulate isoform ratios or to restore readingframe of a protein (e.g., dystrophin for Duchenne's muscular dystrophy).

Example 14 AAV Delivery of Cas13d

As described in the examples above, Cas13d can be effectively packagedinto AAV to mediate expression in cell types that are not amenable toplasmid delivery or for in vivo delivery of Cas13d. AAV delivery ofnuclease active Cas13d can be used to mediate RNA target knock-down inthe cell type of interest. Due to its small size compared to othersingle-effector CRISPR nucleases, Cas13d can be packaged together with aguide RNA or an array containing multiple guide RNAs in a single AAVvector.

Example 15 Nucleic Acid-Based Diagnostics with Cas13d

Cas13d enzymes can be exploited for nucleic acid-based diagnosticswithin the context of a cell, using cell-free lysate derived from acell, or a cell-free system containing an engineered Cas13d enzyme andguide RNA to facilitate formation of a ribonucleoprotein complex. Saidguide RNA can be provided in the form of a pre-guide RNA, a mature guideRNA, or an array containing one or more spacer sequences. The componentscan also be provided in the form of a DNA or RNA precursor encoding forthe Cas13d enzyme and appropriate guide RNA design via an in vitrotranscription/translation system to facilitate the generation of thenecessary components. These components of the diagnostic kit comprisethe “sensor” module.

Such a method can be used to determine if a target RNA is present in atest sample. Such a method can also be used to detect a pathogen, suchas a virus or bacteria, or diagnose a disease state, such as a cancer(e.g., wherein the target RNA is specific for a particular microbe ordisease). Such a method can also be used to test the purity or identityof an environmental sample or agricultural sample, such as seed or soil.

The “sensor” module will then be challenged with a test sample in theform of RNA. Said test sample can be, but is not limited to, a genomicDNA sample that is converted into RNA—for example via in vitrotranscription—or a direct RNA sample. These samples can be extractedfrom biological material such as patient samples (e.g., cells, tissue,blood, plasma, serum, saliva, urine, tumor biopsy, cell free DNA or RNA,exosomes, carrier vesicles or particles) and environmental samples(e.g., soil, water, air, seed, or plant samples). In one embodiment toimprove diagnostic sensitivity, nucleic acid molecules in the sample areamplified using amplification techniques, such as polymerase chainreaction, recombinase polymerase amplification, loop mediated isothermalamplification, nucleic acid sequence based amplification, stranddisplacement amplification, rolling circle amplification, ligase chainreaction, and others (e.g., those that use isothermal amplification).Said amplification techniques can optionally employ nucleic acidconversion techniques such as transcription or reverse transcriptionwith randomized primers or targeted primers.

If the sensor module recognizes a cognate target in the test sample, itwill activate an RNase activity. This RNase activity can be detected byusing a detectable label. In one example, the detectable label includesan RNA linked to a fluorophore and quencher. The intact detectable RNAlinks the fluorophore and quencher, suppressing fluorescence. Uponcleavage by Cas13d of the detectable RNA, the fluorophore is releasedfrom the quencher and displays detectable fluorescent activity.

In another example, cleavage of the reporter RNA releases anon-fluorescent molecule, which can be converted into a visible signal(e.g., visible by eye). In one example, cleavage of the reporter RNAreleases a molecule that can be detected via lateral flow. A moleculethat can be detected by lateral flow is any molecule that can be boundspecifically by antibodies. In one example, the Cas13d protein alongwith the guide RNA detecting the target and the reporter RNA conjugatedto the reporter molecule can be delivered as a single system in the formof a dry test strip. Upon incubation with the test sample, Cas1Cas13dprotein a3d, guide RNA and the reporter RNA are rehydrated and in thepresence of the RNA target. The Cas13d protein will cleave the reporterRNA, resulting in the migration of the reporter molecule in the teststrip via lateral flow and a resulting positive test line signal bybinding to antibodies localized there. Such a shelf stable dry detectionsystem not requiring special (frozen) storage could for example proveadvantageous in situations where detection of a target RNA or DNA isperformed outside a centralized laboratory facility such as a doctor'soffice, a hospital, a pharmacy, during field work, in an agriculturalsetting and so forth.

Cas13d is active over a broad range of temperatures making such anapplication outside of a controlled laboratory environment feasible.

Example 16 Cas13d as a Diagnostic for RNA or DNA Transcribed into RNA InVitro

Cas13d is capable of converting the presence of a matched target RNAinto a visible signal in a minimal diagnostic in vitro system. FIGS.15A-15D (A) Cas13d is converted into an active RNase complex uponbinding a target matching the spacer sequence of the guide RNA. It iscapable of cleaving gRNA-complementary target RNA or non-complementarybystander RNAs. (B) Cas13d target-dependent RNase activity can beconverted into a detectable signal, for example through cleavage of alabeled detector RNA that is cleaved only in the presence of a targetmatching the spacer of the Cas13d guide RNA. In this example, thedetector RNA contains a fluorophore, ‘F’, and a quencher ‘Q’, thatabolishes fluorescence. Only upon bystander RNA cleavage is thefluorophore liberated from the quencher and fluorescence is generated.(C) Cas13d from E. siraeum produces a visible signal only in thepresence of a perfectly matched target and not in the presence ofdifferent mismatched targets. (D) Cas13d from R. flavefaciens strainXPD3002 produces a visible signal only in the presence of a perfectlymatched target and not in the presence of different mismatched targets.

Thus, the system disclosed herein can be part of a lateral flow device(or other solid support), that can be used in diagnostics. The presenceof an RNA or DNA sequence can be converted into a signal than can thenbe detected by conventional lateral flow.

Example 17 Cas13d Modifications

FIGS. 16A-16B, Cas13d is amenable to modifications including truncationsof regions with low conservation among orthologs. The alignment ofCas13d orthologs in FIG. 16A shows regions with high (green bars) andlow (red bars) conservation.

Example 18 Targeting Transcripts In Vivo

The ccdB gene in bacterial cells was targeted in vivo using differentnCas1 orthologues, 10 Eubacterium siraeum nCas1 (Es_nCas1; SEQ ID NO:1); Eubacterium siraeum nCas1 with mutated HEPN domains (Es_nCas1 HEPN−/−; SEQ ID NO: 2); uncultured Ruminococcus sp. nCas1 (uncul_nCas1; SEQID NO: 3) and uncultured Ruminococcus sp. nCas1 with mutated HEPNdomains (uncul_nCas1 HEPN −/−; SEQ ID NO: 4) (FIG. 17A). Chemicallycompetent E. coli (strain BW25141-DE3) cells were transformed with (1)an arabinose-inducible ccdB plasmid, and (2) a second plasmid (targetingvector) carrying a compatible origin of replication, nCas1 proteincoding sequence, and nCas1 guide array containing 4 spacer sequencestargeting the ccdB transcript (FIG. 17A).

1. artificial Eubacterium siraeum nCas1 array targeting ccdB(SEQ ID NO: 261) a. GAACUACACCCGUGCAAAAAUGCAGGGGUCUAAAACUAACGGCUCUCUCUUUUAUAGGUGUAAACCGAACUACACCCGUGCAAAAAUGCAGGGGUCUAAAACCUUUAUCUGACAGCAGACGUGCACUGGCCAGAACUACACCCGUGCAAAAAUGCAGGGGUCUAAAACCAUCAUGCGCCAGCUUUCAUCCCCGAUAUGGAACUACACCCGUGCAAAAAUGCAGGGGUCUAAAACUAAUGGCGUUUUUGAUGUCAUUUUCGCG GUCCGCUGAi. Full 36 nt direct repeat: (SEQ ID NO: 262)GAACUACACCCGUGCAAAAAUGCAGGGGUCUAAAAC ii. Spacer 1: (SEQ ID NO: 263)UAACGGCUCUCUCUUUUAUAGGUGUAAACC iii. Spacer 2: (SEQ ID NO: 264)CUUUAUCUGACAGCAGACGUGCACUGGCCA iv. Spacer 3: (SEQ ID NO: 265)CAUCAUGCGCCAGCUUUCAUCCCCGAUAUG v. Spacer 4: (SEQ ID NO: 266)UAAUGGCGUUUUUGAUGUCAUUUUCGCGGUCCGCUGA2. artificial uncultured Ruminoccus sp. nCas1 array targeting ccdB(SEQ ID NO: 267) a. CUACUACACUGGUGCAAAUUUGCACUAGUCUAAAACUAACGGCUCUCUCUUUUAUAGGUGUAAACCCUACUACACUGGUGCAAAUUUGCACUAGUCUAAAACCUUUAUCUGACAGCAGACGUGCACUGGCCACUACUACACUGGUGCAAAUUUGCACUAGUCUAAAACCAUCAUGCGCCAGCUUUCAUCCCCGAUAUGCUACUACACUGGUGCAAAUUUGCACUAGUCUAAAACUAAUGGCGUUUUUGAUGUCAUUUUCGCGGUC CGCi. Full 36 nt direct repeat: (SEQ ID NO: 268)CUACUACACUGGUGCAAAUUUGCACUAGUCUAAAAC ii. Spacer 1: (SEQ ID NO: 269)UAACGGCUCUCUCUUUUAUAGGUGUAAACC iii. Spacer 2: (SEQ ID NO: 270)CUUUAUCUGACAGCAGACGUGCACUGGCCA iv. Spacer 3: (SEQ ID NO: 271)CAUCAUGCGCCAGCUUUCAUCCCCGAUAUG v. Spacer 4: (SEQ ID NO: 272)UAAUGGCGUUUUUGAUGUCAUUUUCGCGGUCCGCUGA 3. Target RNA (ccdB sequence)(SEQ ID NO: 273) a. AUGCAGUUUAAGGUUUACACCUAUAAAAGAGAGAGCCG UUAUCGUCUGUUUGUGGAUGUACAGAGUGAUAUUAUUGACACGCCCGGGCGACGGAUGGUGAUCCCCCUGGCCAGUGCACGUCUGCUGUCAGAUAAAGUCUCCCGUGAACUUUACCCGGUGGUGCAUAUCGGGGAUGAAAGCUGGCGCAUGAUGACCACCGAUAUGGCCAGUGUGCCGGUCUCCGUUAUCGGGGAAGAAGUGGCUGAUCUCAGCCACCGCGAAAAUGACAUCAAAAACGCCAUUAACCUGAUGUUUUGGGGAAUA i. spacer 1 target:(SEQ ID NO: 274) GGUUUACACCUAUAAAAGAGAGAGCCGUUA ii. spacer 2 target:(SEQ ID NO: 275) UGGCCAGUGCACGUCUGCUGUCAGAUAAAG iii. spacer 3 target:(SEQ ID NO: 276) CAUAUCGGGGAUGAAAGCUGGCGCAUGAUG iv. spacer 4 target:(SEQ ID NO: 277) ACCGCGAAAAUGACAUCAAAAACGCCAUUAv. dCas9 target (underlined): (SEQ ID NO: 259) UUAUCGUCUGUUUGUGGAUGThe transformed bacteria were plated on 2 mM arabinose plates to induceccdB expression and harvested after 24 hours. Total RNA was extractedwith Trizol followed by random hexamer-mediated reverse transcriptionand Taqman probe-based qPCR.

Mutation of the two HEPN domains in each nCas1 protein demonstrate notargeting of ccdB (FIG. 17B, HEPN −/−), while active wild-type nCas1proteins knock downed the expression of ccdB (FIG. 17B, Es_nCas1;uncul_nCas1). Because dCas9-(catalytically inactive, or “dead”Cas9)-mediated transcriptional repression through binding of the DNAsequence downstream of the transcriptional start site is a currentstandard in the field, we validate the assay by targeting dCas9 to theccdB promoter to repress transcription of the ccdB gene. Taken together,this data demonstrates guide RNA-specific degradation of targettranscripts in vivo within prokaryotic cells in a HEPN domain-dependentmanner.

REFERENCES

-   Abudayyeh, O. O., Gootenberg, J. S., Essletzbichler, P., Han, S.,    Joung, J., Belanto, J. J., Verdine, V., Cox, D. B. T., Kellner, M.    J., Regev, A., et al. (2017). RNA targeting with CRISPR-Cas13.    Nature.-   Abudayyeh, O. O., Gootenberg, J. S., Konermann, S., Joung, J.,    Slaymaker, I. M., Cox, D. B., Shmakov, S., Makarova, K. S.,    Semenova, E., Minakhin, L., et al. (2016). C2c2 is a    single-component programmable RNA-guided RNA-targeting CRISPR    effector. Science 353, aaf5573.-   Almeida, S., Zhang, Z., Coppola, G., Mao, W., Futai, K., Karydas,    A., Geschwind, M. D., Tartaglia, M. C., Gao, F., Gianni, D., et al.    (2012). Induced pluripotent stem cell models of    progranulin-deficient frontotemporal dementia uncover specific    reversible neuronal defects. Cell Rep 2, 789-798.-   Anantharaman, V., Makarova, K. S., Burroughs, A. M., Koonin, E. V.,    and Aravind, L. (2013). Comprehensive analysis of the HEPN    superfamily: identification of novel roles in intra-genomic    conflicts, defense, pathogenesis and RNA processing. Biol Direct 8,    15.-   Batra, R., Nelles, D. A., Pirie, E., Blue, S. M., Marina, R. J.,    Wang, H., Chaim, I. A., Thomas, J. D., Zhang, N., Nguyen, V., et    al. (2017) Elimination of Toxic Microsatellite Repeat Expansion RNA    by RNA-Targeting Cas9. Cell 170, 899-912 e810.-   Birmingham, A., Anderson, E. M., Reynolds, A., Ilsley-Tyree, D.,    Leake, D., Fedorov, Y., Baskerville, S., Maksimova, E., Robinson,    K., Karpilow, J., et al. (2006). 3′ UTR seed matches, but not    overall identity, are associated with RNAi off-targets. Nat Methods    3, 199-204.-   Biswas, M. H. U., Almeida, S., Lopez-Gonzalez, R., Mao, W., Zhang,    Z., Karydas, A., Geschwind, M. D., Biernat, J., Mandelkow, E. M.,    Futai, K., et al. (2016). MMP-9 and MMP-2 Contribute to Neuronal    Cell Death in iPSC Models of Frontotemporal Dementia with MAPT    Mutations. Stem Cell Reports 7, 316-324.-   Bland, C., Ramsey, T. L., Sabree, F., Lowe, M., Brown, K., Kyrpides,    N.C., and Hugenholtz, P. (2007). CRISPR recognition tool (CRT): a    tool for automatic detection of clustered regularly interspaced    palindromic repeats. BMC Bioinformatics 8, 209.-   Boeve, B. F., and Hutton, M. (2008). Refining frontotemporal    dementia with parkinsonism linked to chromosome 17: introducing    FTDP-17 (MAPT) and FTDP-17 (PGRN). Arch Neurol 65, 460-464.-   Cheong, C. G., and Hall, T. M. (2006). Engineering RNA sequence    specificity of Pumilio repeats. Proc Natl Acad Sci USA 103,    13635-13639.-   Chiriboga, C. A., Swoboda, K. J., Darras, B. T., Iannaccone, S. T.,    Montes, J., De Vivo, D. C., Norris, D. A., Bennett, C. F., and    Bishop, K. M. (2016). Results from a phase 1 study of nusinersen    (ISIS-SMN(Rx)) in children with spinal muscular atrophy. Neurology    86, 890-897.-   Chylinski, K., Le Rhun, A., and Charpentier, E. (2013). The tracrRNA    and Cas9 families of type II CRISPR-Cas immunity systems. RNA Biol    10, 726-737.-   Cox, D. B. T., Gootenberg, J. S., Abudayyeh, O. O., Franklin, B.,    Kellner, M. J., Joung, J., and Zhang, F. (2017). RNA editing with    CRISPR-Cas13. Science 358, 1019-1027.-   Deltcheva, E., Chylinski, K., Sharma, C. M., Gonzales, K., Chao, Y.,    Pirzada, Z. A., Eckert, M. R., Vogel, J., and Charpentier, E.    (2011). CRISPR RNA maturation by trans-encoded small RNA and host    factor RNase III. Nature 471, 602-607.-   Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C.,    Jha, S., Batut, P., Chaisson, M., and Gingeras, T. R. (2013). STAR:    ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21.-   Doench, J. G., Petersen, C. P., and Sharp, P. A. (2003). siRNAs can    function as miRNAs. Genes Dev 17, 438-442.-   Doudna, J. A., and Charpentier, E. (2014). Genome editing. The new    frontier of genome engineering with CRISPR-Cas9. Science 346,    1258096.-   Du, Y. C., Gu, S., Zhou, J., Wang, T., Cai, H., Macinnes, M. A.,    Bradbury, E. M., and Chen, X. (2006). The dynamic alterations of    H2AX complex during DNA repair detected by a proteomic approach    reveal the critical roles of Ca(2+)/calmodulin in the ionizing    radiation-induced cell cycle arrest. Mol Cell Proteomics 5,    1033-1044.-   East-Seletsky, A., O'Connell, M. R., Burstein, D., Knott, G. J., and    Doudna, J. A. (2017). RNA Targeting by Functionally Orthogonal Type    VI-A CRISPR-Cas Enzymes. Mol Cell 66, 373-383 e373.-   East-Seletsky, A., O'Connell, M. R., Knight, S. C., Burstein, D.,    Cate, J. H., Tjian, R., and Doudna, J. A. (2016). Two distinct RNase    activities of CRISPR-C2c2 enable guide-RNA processing and RNA    detection. Nature 538, 270-273.-   Edgar, R. C. (2007). PILER-CR: fast and accurate identification of    CRISPR repeats. BMC Bioinformatics 8, 18.-   Fonfara, I., Richter, H., Bratovic, M., Le Rhun, A., and    Charpentier, E. (2016). The CRISPR-associated DNA-cleaving enzyme    Cpf1 also processes precursor CRISPR RNA. Nature 532, 517-521.-   Gasiunas, G., Barrangou, R., Horvath, P., and Siksnys, V. (2012).    Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage    for adaptive immunity in bacteria. Proc Natl Acad Sci USA 109,    E2579-2586.-   Gilbert, L. A., Horlbeck, M. A., Adamson, B., Villalta, J. E., Chen,    Y., Whitehead, E. H., Guimaraes, C., Panning, B., Ploegh, H. L.,    Bassik, M. C., et al. (2014). Genome-Scale CRISPR-Mediated Control    of Gene Repression and Activation. Cell 159, 647-661.-   Gilbert, L. A., Larson, M. H., Morsut, L., Liu, Z., Brar, G. A.,    Torres, S. E., Stern-Ginossar, N., Brandman, O., Whitehead, E. H.,    Doudna, J. A., et al. (2013). CRISPR-mediated modular RNA-guided    regulation of transcription in eukaryotes. Cell 154, 442-451.-   Grissa, I., Vergnaud, G., and Pourcel, C. (2007). CRISPRFinder: a    web tool to identify clustered regularly interspaced short    palindromic repeats. Nucleic Acids Res 35, W52-57.-   Hammond, S. M., and Wood, M. J. (2011). Genetic therapies for RNA    mis-splicing diseases. Trends Genet 27, 196-205.-   Heidrich, N., Dugar, G., Vogel, J., and Sharma, C. M. (2015).    Investigating CRISPR RNA Biogenesis and Function Using RNA-seq.    Methods Mol Biol 1311, 1-21.-   Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo,    P., Cheng, J. X., Murre, C., Singh, H., and Glass, C. K. (2010).    Simple combinations of lineage-determining transcription factors    prime cis-regulatory elements required for macrophage and B cell    identities. Mol Cell 38, 576-589.-   Hsu, P. D., Lander, E. S., and Zhang, F. (2014). Development and    applications of CRISPR-Cas9 for genome engineering. Cell 157,    1262-1278.-   Jackson, A. L., Bartz, S. R., Schelter, J., Kobayashi, S. V.,    Burchard, J., Mao, M., Li, B., Cavet, G., and Linsley, P. S. (2003).    Expression profiling reveals off-target gene regulation by RNAi. Nat    Biotechnol 21, 635-637.-   Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J. A., and    Charpentier, E. (2012). A programmable dual-RNA-guided DNA    endonuclease in adaptive bacterial immunity. Science 337, 816-821.-   Kar, A., Kuo, D., He, R., Zhou, J., and Wu, J. Y. (2005). Tau    alternative splicing and frontotemporal dementia. Alzheimer Dis    Assoc Disord 19 Suppl 1, S29-36.-   Kim, E., Koo, T., Park, S. W., Kim, D., Kim, K., Cho, H. Y.,    Song, D. W., Lee, K. J., Jung, M. H., Kim, S., et al. (2017). In    vivo genome editing with a small Cas9 orthologue derived from    Campylobacter jejuni. Nat Commun 8, 14500.-   Liu, L., Li, X., Wang, J., Wang, M., Chen, P., Yin, M., Li, J.,    Sheng, G., and Wang, Y. (2017). Two Distant Catalytic Sites Are    Responsible for C2c2 RNase Activities. Cell 168, 121-134 e112.-   Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation    of fold change and dispersion for RNA-seq data with DESeq2. Genome    Biol 15, 550.-   Makarova, K. S., Wolf, Y. I., Alkhnbashi, O. S., Costa, F., Shah, S.    A., Saunders, S. J., Barrangou, R., Brouns, S. J., Charpentier, E.,    Haft, D. H., et al. (2015). An updated evolutionary classification    of CRISPR-Cas systems. Nat Rev Microbiol 13, 722-736.-   Matera, A. G., and Wang, Z. (2014). A day in the life of the    spliceosome. Nat Rev Mol Cell Biol 15, 108-121.-   Naldini, L. (2015). Gene therapy returns to centre stage. Nature    526, 351-360.-   O'Connell, M. R., Oakes, B. L., Sternberg, S. H., East-Seletsky, A.,    Kaplan, M., and Doudna, J. A. (2014). Programmable RNA recognition    and cleavage by CRISPR/Cas9. Nature 516, 263-266.-   Orengo, J. P., Bundman, D., and Cooper, T. A. (2006). A bichromatic    fluorescent reporter for cell-based screens of alternative splicing.    Nucleic Acids Res 34, e148.-   Peabody, D. S. (1993). The RNA binding site of bacteriophage MS2    coat protein. EMBO J 12, 595-600.-   Ran, F. A., Cong, L., Yan, W. X., Scott, D. A., Gootenberg, J. S.,    Kriz, A. J., Zetsche, B., Shalem, O., Wu, X., Makarova, K. S., et    al. (2015). In vivo genome editing using Staphylococcus aureus Cas9.    Nature 520, 186-191.-   Samai, P., Pyenson, N., Jiang, W., Goldberg, G. W., Hatoum-Aslan,    A., and Marraffini, L. A. (2015). Co-transcriptional DNA and RNA    Cleavage during Type III CRISPR-Cas Immunity. Cell 161, 1164-1174.-   Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995).    Quantitative monitoring of gene expression patterns with a    complementary DNA microarray. Science 270, 467-470.-   Schoch, K. M., DeVos, S. L., Miller, R. L., Chun, S. J., Norrbom,    M., Wozniak, D. F., Dawson, H. N., Bennett, C. F., Rigo, F., and    Miller, T. M. (2016). Increased 4R-Tau Induces Pathological Changes    in a Human-Tau Mouse Model. Neuron 90, 941-947.-   Shendure, J., Balasubramanian, S., Church, G. M., Gilbert, W.,    Rogers, J., Schloss, J. A., and Waterston, R. H. (2017). DNA    sequencing at 40: past, present and future. Nature advance online    publication.-   Shmakov, S., Abudayyeh, O. O., Makarova, K. S., Wolf, Y. I.,    Gootenberg, J. S., Semenova, E., Minakhin, L., Joung, J., Konermann,    S., Severinov, K., et al. (2015). Discovery and Functional    Characterization of Diverse Class 2 CRISPR-Cas Systems. Mol Cell 60,    385-397.-   Sigoillot, F. D., Lyman, S., Huckins, J. F., Adamson, B., Chung, E.,    Quattrochi, B., and King, R. W. (2012). A bioinformatics method    identifies prominent off-targeted transcripts in RNAi screens. Nat    Methods 9, 363-366.-   Smargon, A. A., Cox, D. B., Pyzocha, N. K., Zheng, K., Slaymaker, I.    M., Gootenberg, J. S., Abudayyeh, O. A., Essletzbichler, P.,    Shmakov, S., Makarova, K. S., et al. (2017). Cas13b Is a Type VI-B    CRISPR-Associated RNA-Guided RNase Differentially Regulated by    Accessory Proteins Csx27 and Csx28. Mol Cell 65, 618-630 e617.-   Smith, I., Greenside, P. G., Natoli, T., Lahr, D. L., Wadden, D.,    Tirosh, I., Narayan, R., Root, D. E., Golub, T. R., Subramanian, A.,    et al. (2017). Evaluation of RNAi and CRISPR technologies by    large-scale gene expression profiling in the Connectivity Map. PLoS    Biol 15, e2003213.-   Swiech, L., Heidenreich, M., Banerjee, A., Habib, N., Li, Y.,    Trombetta, J., Sur, M., and Zhang, F. (2015). In vivo interrogation    of gene function in the mammalian brain using CRISPR-Cas9. Nat    Biotechnol 33, 102-106.-   Treu, L., Kougias, P. G., Campanaro, S., Bassani, I., and    Angelidaki, I. (2016). Deeper insight into the structure of the    anaerobic digestion microbial community; the biogas microbiome    database is expanded with 157 new genomes. Bioresour Technol 216,    260-266.-   van der Oost, J., Westra, E. R., Jackson, R. N., and Wiedenheft, B.    (2014). Unravelling the structural and mechanistic basis of    CRISPR-Cas systems. Nat Rev Microbiol 12, 479-492.-   Wang, Y., Liu, J., Huang, B. O., Xu, Y. M., Li, J., Huang, L. F.,    Lin, J., Zhang, J., Min, Q. H.,-   Yang, W. M., et al. (2015). Mechanism of alternative splicing and    its regulation. Biomed Rep 3, 152-158.-   Wright, A. V., and Doudna, J. A. (2016). Protecting genome integrity    during CRISPR immune adaptation. Nat Struct Mol Biol 23, 876-883.-   Yang, X., Zou, P., Yao, J., Yun, D., Bao, H., Du, R., Long, J., and    Chen, X. (2010). Proteomic dissection of cell type-specific    H2AX-interacting protein complex associated with hepatocellular    carcinoma. J Proteome Res 9, 1402-1415.-   Yosef, I., Goren, M. G., and Qimron, U. (2012). Proteins and DNA    elements essential for the CRISPR adaptation process in Escherichia    coli. Nucleic Acids Res 40, 5569-5576.-   Zalatan, J. G., Lee, M. E., Almeida, R., Gilbert, L. A.,    Whitehead, E. H., La Russa, M., Tsai, J. C., Weissman, J. S.,    Dueber, J. E., Qi, L. S., et al. (2015). Engineering complex    synthetic transcriptional programs with CRISPR RNA scaffolds. Cell    160, 339-350.-   Zetsche, B., Gootenberg, J. S., Abudayyeh, O. O., Slaymaker, I. M.,    Makarova, K. S., Essletzbichler, P., Volz, S. E., Joung, J., van der    Oost, J., Regev, A., et al. (2015). Cpf1 is a single RNA-guided    endonuclease of a class 2 CRISPR-Cas system. Cell 163, 759-771.-   Zetsche, B., Strecker, J., Abudayyeh, O. O., Gootenberg, J. S.,    Scott, D. A., and Zhang, F. (2017). A Survey of Genome Editing    Activity for 16 Cpf1 orthologs. bioRxiv.-   Zhang, J., Graham, S., Tello, A., Liu, H., and White, M. F. (2016).    Multiple nucleic acid cleavage modes in divergent type III CRISPR    systems. Nucleic Acids Res 44, 1789-1799.-   Zhang, Y., Pak, C., Han, Y., Ahlenius, H., Zhang, Z., Chanda, S.,    Marro, S., Patzke, C., Acuna, C., Covy, J., et al. (2013). Rapid    single-step induction of functional neurons from human pluripotent    stem cells. Neuron 78, 785-798.

In view of the many possible embodiments to which the principles of thedisclosure may be applied, it should be recognized that the illustratedembodiments are only examples of the invention and should not be takenas limiting the scope of the invention. Rather, the scope of theinvention is defined by the following claims. We therefore claim as ourinvention all that comes within the scope and spirit of these claims.

We claim:
 1. A method of splicing or perturbing the splicing of one ormore target RNA molecules, comprising: contacting one or more target RNAmolecules with a non-naturally occurring or engineered clusteredregularly interspaced short palindromic repeats (CRISPR)-associated(Cas) system comprising: at least one isolated protein, or a nucleicacid molecule encoding the at least one isolated protein, wherein the atleast one isolated protein comprises at least 80%, at least 85%, atleast 90%, at least 92%, at least 95%, at least 96%, at least 97%, atleast 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,110, 111, 112, 113, 138, 147, 149, 153, 155, 158, 160, 162, 164, 166,168, 170, 175, 177, 179, 181, 183, 185, 187, 189, 194, 198, 200, 202,204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229, 231,233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 310, 311, 312, or313; and at least one gRNA that hybridizes with the one or more targetRNA molecules, or at least one nucleic acid molecule encoding the gRNA,whereby the isolated protein forms a complex with the gRNA, wherein thegRNA directs the complex to the one or more target RNA molecules therebysplicing or perturbing the splicing of the one or more target RNAmolecules.
 2. The method of claim 1, wherein the at least one gRNAcomprises one or more direct repeat (DR) sequences comprising at least80%, at least 85%, at least 90%, at least 92%, at least 95%, at least96%, at least 97%, at least 98%, at least 99%, or 100% sequence identityto SEQ ID NO: 129, 130, 131, 132, 133, 134, 135, 136, 137, 148, 150,151, 152, 154, 156, 157, 159, 161, 163, 165, 167, 169, 176, 178, 180,182, 184, 186, 188, 190, 191, 192, 193, 199, 201, 203, 205, 207, 209,211, 213, 215, 217, 219, 221, 223, 225, 227, 228, 230, 232, 234, 236,238, 240, 242, 244, 246, 248, 250, 252, or 254, or a truncated versionthereof.
 3. The method of claim 1, wherein the one or more target RNAmolecules is a non-coding RNA.
 4. The method of claim 1, wherein thenucleic acid molecule encoding the at least one isolated proteincomprises: at least 80%, at least 85%, at least 90%, at least 92%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 124, 125, 126, 127, 128, 139, 140or 141; at least 80%, at least 85%, at least 90%, at least 92%, at least95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%sequence identity to SEQ ID NO: 114, 115, 116, 117, 118, 119, 120, 121,122, 123, 142, 143, 144, or 145; or encodes a protein sequencecomprising at least 80%, at least 85%, at least 90%, at least 92%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 138,147, 149, 153, 155, 158, 160, 162, 164, 166, 168, 170, 175, 177, 179,181, 183, 185, 187, 189, 194, 198, 200, 202, 204, 206, 208, 210, 212,214, 216, 218, 220, 222, 224, 226, 229, 231, 233, 235, 237, 239, 241,243, 245, 247, 249, 251, 253, 310, 311, 312, or
 313. 5. The method ofclaim 1, wherein the method treats a disease, and wherein the one ormore target RNA molecules is associated with a disease.
 6. The method ofclaim 1, wherein the at least one isolated protein further comprises: asubcellular localization signal; a mutation in at least one native HEPNdomain; or both a subcellular localization signal and a mutation in atleast one native HEPN domain.
 7. The method of claim 6, wherein the atleast one isolated protein further comprises a mutated HEPN1 domain or amutated HEPN2 domain.
 8. The method of claim 6, wherein the mutation inthe at least one native HEPN domain comprises a mutation in a sequencecomprising RXXXXH, wherein the R of the sequence is mutated, the H ofthe sequence is mutated, or both the R and the H of the sequence aremutated.
 9. The method of claim 1, wherein the nucleic acid moleculeencoding the at least one isolated protein is part of a recombinantvector.
 10. The method of claim 9, wherein: the recombinant vectorcomprises a plasmid or viral vector; the isolated nucleic acid moleculeis operably linked to a promoter; and/or the recombinant vector furthercomprises at least one gRNA.
 11. The method of claim 1, whereincontacting the one or more target RNA molecules with the non-naturallyoccurring or engineered CRISPR-Cas system comprises introducing into acell containing the one or more target RNA molecules the non-naturallyoccurring or engineered CRISPR-Cas system.
 12. The method of claim 11wherein the CRISPR-Cas system is introduced into the cell usingendocytosis, a liposome, a particle, an exosome, a microvesicle, a genegun, electroporation, a virus, or combinations thereof.
 13. The methodof claim 11, wherein the cell is a eukaryotic cell.
 14. The method ofclaim 11, wherein the cell is a non-bacterial cell.
 15. The method ofclaim 1, wherein the method is performed ex vivo, in vitro, or in acell-free system.
 16. The method of claim 1, wherein the gRNA furthercomprises one or more spacer sequences specific for the one or moretarget RNA molecules, a nucleic acid aptamer, or both.
 17. The method ofclaim 1, wherein the at least one isolated protein and the one or moretarget RNA molecules are part of a ribonucleoprotein (RNP) complex. 18.The method of claim 1, wherein the at least one isolated protein iscatalytically inactive for ribonuclease activity.